What do developers know about machine learning: a study of ML discussions on StackOverflow

Abdul Ali Bangash, Hareem Sahar, Shaiful Chowdhury, Alexander William Wong, Abram Hindle, Karim Ali

2019/03/01

What do developers know about machine learning: a study of ML discussions on StackOverflow

Authors

Abdul Ali Bangash, Hareem Sahar, Shaiful Chowdhury, Alexander William Wong, Abram Hindle, Karim Ali

Venue

Abstract

Machine learning is a branch of Artificial Intelligence that lets computers learn from experience instead of being explicitly programmed to do everything. It is growing in popularity over time and is successfully being used for some of the Software Engineering tasks today e.g. bug prediction and software development effort estimation. In order to gain deeper insights into the uses of machine learning in software engineering context, we conduct a study on SOTorrent dataset that contains Stackoverflow posts from 2008 to 2018. We studied almost 28000 machine learning posts spanning a ten year interval and identified the problems of software engineering addressed by machine learning. Our analyses on the metadata of posts show that ample support for classical machine learning problems is available on Stackoverflow. However, state-of-the-art machine learning algorithms and technologies currently lack support, probably because of their less prevalence in the software engineering community as of now. We believe that the insights provided by our study will be useful for software engineers, educators and practitioners alike.

Bibtex

@inproceedings{bangash2019MSRChallenge-ML,
 abstract = {Machine learning is a branch of Artificial Intelligence that lets computers learn from experience instead of being explicitly programmed to do everything. It is growing in popularity over time and is successfully being used for some of the Software Engineering tasks today e.g. bug prediction and software development effort estimation. In order to gain deeper insights into the uses of machine learning in software engineering context, we conduct a study on SOTorrent dataset that contains Stackoverflow posts from 2008 to 2018. We studied almost 28000 machine learning posts spanning a ten year interval and identified the problems of software engineering addressed by machine learning. Our analyses on the metadata of posts show that ample support for classical machine learning problems is available on Stackoverflow. However, state-of-the-art machine learning algorithms and technologies currently lack support, probably because of their less prevalence in the software engineering community as of now. We believe that the insights provided by our study will be useful for software engineers, educators and practitioners alike.},
 accepted = {2019-03-01},
 author = {Abdul Ali Bangash and Hareem Sahar and Shaiful Chowdhury and Alexander William Wong and Abram Hindle and Karim Ali},
 authors = {Abdul Ali Bangash, Hareem Sahar, Shaiful Chowdhury, Alexander William Wong, Abram Hindle, Karim Ali},
 booktitle = {Proceedings of the 6th International Conference on Mining Software Repositories (MSR19)},
 code = {bangash2019MSRChallenge-ML},
 date = {2019-05-26},
 funding = {NSERC Discovery},
 location = {Montreal, Canada},
 pagerange = {1--5},
 pages = {1--5},
 rate = {14/27 or 52%},
 region = {Quebec},
 role = {Co-Author},
 title = {What do developers know about machine learning: a study of ML discussions on StackOverflow},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/bangash2019MSRChallenge-ML.pdf},
 venue = {Proceedings of the 6th International Conference on Mining Software Repositories (MSR19)},
 year = {2019}
}