Deficient Documentation and Stack Overflow
We have developed a method for locating aspects of a project that are inadequately documented by combining data from Stack Overflow and the project’s documentation. This method uses LDA to guide the manual analysis of topics that may not have been sufficiently addressed by the project’s documentation.
Abstract
A project’s documentation is the primary source of information for developers using that project. With hundreds of thousands of programming-related questions posted on programming Q&A websites, such as Stack Overflow, we question whether the developer-written documentation provides enough guidance for programmers. In this study, we wanted to know if there are any topics which are inadequately covered by the project documentation. We combined questions from Stack Overflow and documentation from the PHP and Python projects. Then, we applied topic analysis to this data using latent Dirichlet allocation (LDA), and found topics in Stack Overflow that did not overlap the project documentation. We successfully located topics that had deficient project documentation. We also found topics in need of tutorial documentation that were outside of the scope of the PHP or Python projects, such as MySQL? and HTML.
Distribution of Per-Class-Maximum Document Weight Differences. The Y-axis represents the difference between the most representative Stack Overflow post and the most representative project documentation document. The X-axis shows all topics sorted by this difference. PHP and Python were processed completely independently.Click here for higher resolution.
Authors
- Joshua Charles Campbell, Chenlei Zhang, Zhen Xu, Abram Hindle, James Miller
- Department of Computing Science and Department of Electrical and Computer Engineering
- University of Alberta
- Edmonton, Canada
Presentation
Paper
Citation
Campbell, Joshua Charles, Chenlei Zhang, Zhen Xu, Abram Hindle, and James Miller. "Deficient documentation detection: a methodology to locate deficient project documentation using topic analysis." In Proceedings of the Tenth International Workshop on Mining Software Repositories, pp. 57-60. IEEE Press, 2013.
@inproceedings{campbell2013deficient, title={Deficient documentation detection: a methodology to locate deficient project documentation using topic analysis}, author={Campbell, Joshua Charles and Zhang, Chenlei and Xu, Zhen and Hindle, Abram and Miller, James}, booktitle={Proceedings of the Tenth International Workshop on Mining Software Repositories}, pages={57--60}, year={2013}, organization={IEEE Press} }