Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes

Alexander William Wong, Amir Salimi, Shaiful Alam Chowdhury, Abram Hindle

2019/07/13

Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes

Authors

Alexander William Wong, Amir Salimi, Shaiful Alam Chowdhury, Abram Hindle

Venue

Abstract

Machine learning is a branch of Artificial Intelligence that lets computers learn from experience instead of being explicitly programmed to do everything. It is growing in popularity over time and is successfully being used for some of the Software Engineering tasks today e.g. bug prediction and software development effort estimation. In order to gain deeper insights into the uses of machine learning in software engineering context, we conduct a study on SOTorrent dataset that contains Stackoverflow posts from 2008 to 2018. We studied almost 28000 machine learning posts spanning a ten year interval and identified the problems of software engineering addressed by machine learning. Our analyses on the metadata of posts show that ample support for classical machine learning problems is available on Stackoverflow. However, state-of-the-art machine learning algorithms and technologies currently lack support, probably because of their less prevalence in the software engineering community as of now. We believe that the insights provided by our study will be useful for software engineers, educators and practitioners alike.

Bibtex

@inproceedings{wongICSME2019-syntax,
 abstract = {Machine learning is a branch of Artificial Intelligence that lets computers learn from experience instead of being explicitly programmed to do everything. It is growing in popularity over time and is successfully being used for some of the Software Engineering tasks today e.g. bug prediction and software development effort estimation. In order to gain deeper insights into the uses of machine learning in software engineering context, we conduct a study on SOTorrent dataset that contains Stackoverflow posts from 2008 to 2018. We studied almost 28000 machine learning posts spanning a ten year interval and identified the problems of software engineering addressed by machine learning. Our analyses on the metadata of posts show that ample support for classical machine learning problems is available on Stackoverflow. However, state-of-the-art machine learning algorithms and technologies currently lack support, probably because of their less prevalence in the software engineering community as of now. We believe that the insights provided by our study will be useful for software engineers, educators and practitioners alike.},
 accepted = {2019-07-13},
 author = {Alexander William Wong and Amir Salimi and Shaiful Alam Chowdhury and Abram Hindle},
 authors = {Alexander William Wong, Amir Salimi, Shaiful Alam Chowdhury, Abram Hindle},
 booktitle = {2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
 code = {wongICSME2019-syntax},
 date = {2019-07-13},
 funding = {NSERC Discovery},
 location = {Cleveleand, United States},
 pagerange = {318--322},
 pages = {318--322},
 rate = {26/46 or 56%},
 role = {Co-Author},
 title = {Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/wongICSME2019-syntax.pdf},
 venue = {2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
 year = {2019}
}