Using LDA to extract Requirements Topics and then Tracking Commits Relevant to These Topics Across Time
In this paper we take topics from requirements and track relevant commits. Then we ask stakeholders such as product managers and developers if these topics make sense. We ask stakeholders such as practitioners to label topics and to look at commits relevant to topics to see if LDA topics make sense to practitioners.
Relating Requirements to Implementation via Topic Analysis:Do Topics Extracted from Requirements Make Sense to Managers and Developers?
This paper was accepted to ICSM 2012 in Trento, Italy. Here's a preview:
Topic Plot of Commits related to Requirements (bigger version)
Large organizations like Microsoft tend to rely on formal requirements documentation in order to specify and design the software products that they develop. These documents are meant to be tightly coupled with the actual implementation of the features they describe. In this paper we evaluate the value of high-level topic-based requirements traceability in the version control system, using Latent Dirichlet Allocation (LDA). We evaluate LDA topics on practitioners and check if the topics and trends extracted matches the perception that Program Managers and Developers have about the effort put into addressing certain topics. We found that effort extracted from version control that was relevant to a topic often matched the perception of the managers and developers of what occurred at the time. Furthermore we found evidence that many of the identified topics made sense to practitioners and matched their perception of what occurred. But for some topics, we found that practitioners had difficulty interpreting and labelling them. In summary, we investigate the high-level traceability of requirements topics to version control commits via topic analysis and validate with the actual stakeholders the relevance of these topics extracted from requirements.