An Empirical Study to Investigate Collaboration Among Developers in Open Source Software (OSS)

Weijie Sun and Samuel Iwuchukwu and Abdul Ali Bangash and Abram Hindle

2023/03/07

An Empirical Study to Investigate Collaboration Among Developers in Open Source Software (OSS)

Authors

Weijie Sun and Samuel Iwuchukwu and Abdul Ali Bangash and Abram Hindle

Venue

Abstract

The value of teamwork is being recognized by project owners, resulting in an increased acknowledgement of collaboration among developers in software engineering. A good understanding of how developers work together could positively impact software development practices. In this paper, we investigate the collaboration habits of developers in project files by leveraging the World of Code (WoC) dataset and GitHub API. We first identify the collaboration level of developers within the project files, such as the source, test, documentation, and build files, using the Author Cross Entropy (ACE). From the results we find out that test files report the highest degree of collaboration among the developers, perhaps because collaboration is critical to ensure convergence of functionality tests. Furthermore, the source code files show the least degree of collaboration, perhaps because of code ownership and the complexity and difficulty in code modification. Secondly, given the widespread usage of the Python programming language, we investigate the Python code tokens that are more prone to change and collaboration. Our findings offer insights into the specific project files and Python code tokens that developers typically collaborate on in the open-source community. This information can be used by researchers and developers to enhance existing collaboration platforms and tools.

Bibtex

@inproceedings{sun2023MSR-author-cross-entropy,
 abstract = {The value of teamwork is being recognized by project owners, resulting in an increased acknowledgement of collaboration among developers in software engineering. A good understanding of how developers work together could positively impact software development practices. In this paper, we investigate the collaboration habits of developers in project files by leveraging the World of Code (WoC) dataset and GitHub API. We first identify the collaboration level of developers within the project files, such as the source, test, documentation, and build files, using the Author Cross Entropy (ACE). From the results we find out that test files report the highest degree of collaboration among the developers, perhaps because collaboration is critical to ensure convergence of functionality tests. Furthermore, the source code files show the least degree of collaboration, perhaps because of code ownership and the complexity and difficulty in code modification. Secondly, given the widespread usage of the Python programming language, we investigate the Python code tokens that are more prone to change and collaboration. Our findings offer insights into the specific project files and Python code tokens that developers typically collaborate on in the open-source community. This information can be used by researchers and developers to enhance existing collaboration platforms and tools.},
 accepted = {2023-03-07},
 author = {Weijie Sun and Samuel Iwuchukwu and Abdul Ali Bangash and Abram Hindle},
 authors = {Weijie Sun and Samuel Iwuchukwu and Abdul Ali Bangash and Abram Hindle},
 booktitle = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): Challenge Track},
 code = {sun2023MSR-author-cross-entropy},
 date = {2023-05-15},
 funding = {NSERC Discovery},
 location = {Melbourne, Australia},
 pagerange = {352--356},
 pages = {352--356},
 rate = {50%},
 role = {Co-Author},
 title = {An Empirical Study to Investigate Collaboration Among Developers in Open Source Software (OSS)},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sun2023MSR-author-cross-entropy.pdf},
 venue = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): Challenge Track},
 year = {2023}
}