Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers
Authors
Arthur V. Kamienski and Abram Hindle and Cor-Paul Bezemer
Venue
- Empirical Software Engineering Journal (EMSE)
- 2022
- 1–46
Abstract
Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.
Bibtex
@article{kamienski2022EMSE-dupe-question-gamedev,
abstract = {Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.},
accepted = {2022-11-02},
author = {Arthur V. Kamienski and Abram Hindle and Cor-Paul Bezemer},
authors = {Arthur V. Kamienski and Abram Hindle and Cor-Paul Bezemer},
code = {kamienski2022EMSE-dupe-question-gamedev},
day = {08},
funding = {NSERC Discovery},
institution = {University of Alberta},
journal = {Empirical Software Engineering Journal (EMSE)},
month = {December},
number = {17},
pages = {1--46},
role = { Researcher / Co-author},
title = {Analyzing Techniques for Duplicate Question Detection on Q\&A Websites for Game Developers},
type = {article},
url = {http://softwareprocess.ca/pubs/kamienski2022EMSE-dupe-question-gamedev.pdf},
venue = {Empirical Software Engineering Journal (EMSE)},
volume = {28},
year = {2022}
}