Abram Hindle's Publications

IRJIT: A simple, online, information retrieval approach for just-in-time software defect prediction

Hareeme Sahar and Abdul Ali Bangash and Abram Hindle and Denilson Barbosa
Empirical Software Engineering,
2024 1--38
PDF
Publisher Link
DOI:https://doi.org/10.1007/s10664-024-10514-z

Just-in-Time software defect prediction (JIT-SDP) prevents the introduction of defects into the software by identifying them at commit check-in time. Current software defect prediction approaches rely on manually crafted features such as change metrics and involve expensive to train machine learning or deep learning models. These models typically involve extensive training processes that may require significant computational resources and time. These characteristics can pose challenges when attempting to update the models in real-time as new examples become available, potentially impacting their suitability for fast online defect prediction. Furthermore, the reliance on a complex underlying model makes these approaches often less explainable, which means the developers cannot understand the reasons behind models’ predictions. An approach that is not explainable might not be adopted in real-life development environments because of developers’ lack of trust in its results. To address these limitations, we propose an approach called IRJIT that employs information retrieval on source code and labels new commits as buggy or clean based on their similarity to past buggy or clean commits. IRJIT approach is online and explainable as it can learn from new data without expensive retraining, and developers can see the documents that support a prediction, providing additional context. By evaluating 10 open-source datasets in a within project setting, we show that our approach is up to 112 times faster than the state-of-the-art ML and DL approaches, offers explainability at the commit and line level, and has comparable performance to the state-of-the-art.

@article{sahar2024emse-IRJIT,
 abstract = {Just-in-Time software defect prediction (JIT-SDP) prevents the introduction of defects into the software by identifying them at commit check-in time. Current software defect prediction approaches rely on manually crafted features such as change metrics and involve expensive to train machine learning or deep learning models. These models typically involve extensive training processes that may require significant computational resources and time. These characteristics can pose challenges when attempting to update the models in real-time as new examples become available, potentially impacting their suitability for fast online defect prediction. Furthermore, the reliance on a complex underlying model makes these approaches often less explainable, which means the developers cannot understand the reasons behind models’ predictions. An approach that is not explainable might not be adopted in real-life development environments because of developers’ lack of trust in its results. To address these limitations, we propose an approach called IRJIT that employs information retrieval on source code and labels new commits as buggy or clean based on their similarity to past buggy or clean commits. IRJIT approach is online and explainable as it can learn from new data without expensive retraining, and developers can see the documents that support a prediction, providing additional context. By evaluating 10 open-source datasets in a within project setting, we show that our approach is up to 112 times faster than the state-of-the-art ML and DL approaches, offers explainability at the commit and line level, and has comparable performance to the state-of-the-art.},
 accepted = {2024-06-10},
 author = {Hareeme Sahar and Abdul Ali Bangash and Abram Hindle and Denilson Barbosa},
 authors = {Hareeme Sahar and Abdul Ali Bangash and Abram Hindle and Denilson Barbosa},
 code = {sahar2024emse-IRJIT},
 day = {02},
 doi = {https://doi.org/10.1007/s10664-024-10514-z},
 funding = {NSERC Discovery, CIHR},
 institution = {University of Alberta},
 journal = {Empirical Software Engineering},
 month = {August},
 number = {131},
 pages = {1--38},
 payurl = {https://link.springer.com/article/10.1007/s10664-024-10514-z},
 role = { Researcher / Co-author},
 title = {IRJIT: A simple, online, information retrieval approach for just-in-time software defect prediction},
 type = {article},
 url = {http://softwareprocess.ca/pubs/sahar2024emse-IRJIT.pdf},
 venue = {Empirical Software Engineering},
 volume = {29},
 year = {2024}
}

Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data

Vikhyat Agrawal and Sunil Vasu Kalmady and Venkataseetharam Manoj Malipeddi and Manisimha Varma Manthena and Weijie Sun and Saiful Islam and Abram Hindle and Padma Kaul and Russell Greiner
International Conference on Medical and Health Informatics (ICMHI 2024), Yokohama, Japan
2024 1--9
PDF

This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training.

@inproceedings{agrawal2024ICMHI-federated,
 abstract = {This research paper explores ways to apply Federated Learning (FL) and Differential Privacy (DP) techniques to population-scale Electrocardiogram (ECG) data. The study learns a multi-label ECG classification model using FL and DP based on 1,565,849 ECG tracings from 7 hospitals in Alberta, Canada. The FL approach allowed collaborative model training without sharing raw data between hospitals while building robust ECG classification models for diagnosing various cardiac conditions. These accurate ECG classification models can facilitate the diagnoses while preserving patient confidentiality using FL and DP techniques. Our results show that the performance achieved using our implementation of the FL approach is comparable to that of the pooled approach, where the model is trained over the aggregating data from all hospitals. Furthermore, our findings suggest that hospitals with limited ECGs for training can benefit from adopting the FL model compared to single-site training. In addition, this study showcases the trade-off between model performance and data privacy by employing DP during model training.},
 accepted = {2024-02-04},
 author = {Vikhyat Agrawal and Sunil Vasu Kalmady and Venkataseetharam Manoj Malipeddi and Manisimha Varma Manthena and Weijie Sun and Saiful Islam and Abram Hindle and Padma Kaul and Russell Greiner},
 authors = {Vikhyat Agrawal and Sunil Vasu Kalmady and Venkataseetharam Manoj Malipeddi and Manisimha Varma Manthena and Weijie Sun and Saiful Islam and Abram Hindle and Padma Kaul and Russell Greiner},
 booktitle = {International Conference on Medical and Health Informatics (ICMHI 2024)},
 code = {agrawal2024ICMHI-federated},
 date = {2024-05-15},
 funding = {NSERC Discovery},
 location = {Yokohama, Japan},
 pages = {1--9},
 role = {Editorial},
 title = {Federated Learning and Differential Privacy Techniques on Multi-hospital Population-scale Electrocardiogram Data},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/agrawal2024ICMHI-federated.pdf},
 venue = {International Conference on Medical and Health Informatics (ICMHI 2024)},
 year = {2024}
}

Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language

Anisha Islam and Kalvin Eng and Abram Hindle
2024 IEEE/ACM 21th International Conference on Mining Software Repositories (MSR) Data Track, Lisbon, Portugal
2024 1--6
Acceptance:32/65
PDF
DOI:http://dx.doi.org/10.1145/3643991.3644865

Pure Data (PD), a data-flow based visual programming language utilized for music and sound synthesis, remains underexplored in software engineering research. Existing literature fails to address the nuanced programming practices within PD, prompting the need to investigate how end-users manipulate nodes and edges in this visual language. This paper systematically extracts and analyzes 6,534 publicly available PD projects from GitHub. Employing source code parsing, pattern matching, and statistical analysis, we unveil usage patterns of PD by the end-user programmers. We found that most revisions of the PD files are small and simple, with fewer than 64 nodes, 51 connections, and 3 revisions. Most PD projects have less than 17 PD files, 31 commits, and only 1 author working on the PD files. The median differences in the number of nodes and edges between each commit and its parents, modifying the same file, are 3 and 0, respectively, implying small changes across various revisions of a PD file. Our findings contribute a valuable dataset for future studies, addressing the dearth of research in PD. By unraveling usage patterns, we provide insights that empower scholars and practitioners to optimize the programming experience for end-users in the realm of visual programming languages.

@inproceedings{islam2024MSR-pure-data,
 abstract = {Pure Data (PD), a data-flow based visual programming language utilized for music and sound synthesis, remains underexplored in software engineering research. Existing literature fails to address the nuanced programming practices within PD, prompting the need to investigate how end-users manipulate nodes and edges in this visual language. This paper systematically extracts and analyzes 6,534 publicly available PD projects from GitHub. Employing source code parsing, pattern matching, and statistical analysis, we unveil usage patterns of PD by the end-user programmers. We found that most revisions of the PD files are small and simple, with fewer than 64 nodes, 51 connections, and 3 revisions. Most PD projects have less than 17 PD files, 31 commits, and only 1 author working on the PD files. The median differences in the number of nodes and edges between each commit and its parents, modifying the same file, are 3 and 0, respectively, implying small changes across various revisions of a PD file. Our findings contribute a valuable dataset for future studies, addressing the dearth of research in PD. By unraveling usage patterns, we provide insights that empower scholars and practitioners to optimize the programming experience for end-users in the realm of visual programming languages.},
 accepted = {2024-01-12},
 author = {Anisha Islam and Kalvin Eng and Abram Hindle},
 authors = {Anisha Islam and Kalvin Eng and Abram Hindle},
 booktitle = {2024 IEEE/ACM 21th International Conference on Mining Software Repositories (MSR) Data Track},
 code = {islam2024MSR-pure-data},
 date = {2024-04-15},
 doi = {http://dx.doi.org/10.1145/3643991.3644865},
 funding = {NSERC Discovery},
 location = {Lisbon, Portugal},
 pages = {1--6},
 rate = {32/65},
 role = {Co-Author},
 title = {Opening the Valve on Pure-Data: Usage Patterns and Programming Practices of a Data-Flow Based Visual Programming Language},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/islam2024MSR-pure-data.pdf},
 venue = {2024 IEEE/ACM 21th International Conference on Mining Software Repositories (MSR) Data Track},
 year = {2024}
}

Patterns of multi-container composition for service orchestration with Docker Compose

Kalvin Eng and Abram Hindle and Eleni Stroulia
Empirical Software Engineering,
2024 1--41
PDF
Publisher Link
DOI:https://doi.org/10.1007/s10664-024-10462-8

Software design patterns present general code solutions to common software design problems. Modern software systems rely heavily on containers for running their constituent service components. Yet, despite the prevalence of ready-to-use Docker service images ready to participate in multi-container service compositions of applications, developers do not have much guidance on how to compose their own Docker service orchestrations. Thus in this work, we curate a dataset of successful projects that employ Docker Compose as an orchestration tool to run multiple service containers; then, we engage in qualitative and quantitative analysis of Docker Compose configurations. The collection of data and analysis enables the identification and naming of repeating multi-container composition patterns that are used in numerous successful open-source projects, much like software design patterns. These patterns highlight how software systems are orchestrated in the real-world and can give examples to anybody wishing to compose their own service orchestrations. These contributions also advance empirical research in software engineering patterns as evidence is provided about how Docker Compose is used.

@article{eng2024EMSE-docker-compose,
 abstract = {Software design patterns present general code solutions to common software design problems. Modern software systems rely heavily on containers for running their constituent service components. Yet, despite the prevalence of ready-to-use Docker service images ready to participate in multi-container service compositions of applications, developers do not have much guidance on how to compose their own Docker service orchestrations. Thus in this work, we curate a dataset of successful projects that employ Docker Compose as an orchestration tool to run multiple service containers; then, we engage in qualitative and quantitative analysis of Docker Compose configurations. The collection of data and analysis enables the identification and naming of repeating multi-container composition patterns that are used in numerous successful open-source projects, much like software design patterns. These patterns highlight how software systems are orchestrated in the real-world and can give examples to anybody wishing to compose their own service orchestrations. These contributions also advance empirical research in software engineering patterns as evidence is provided about how Docker Compose is used.},
 accepted = {2024-02-19},
 author = {Kalvin Eng and Abram Hindle and Eleni Stroulia},
 authors = {Kalvin Eng and Abram Hindle and Eleni Stroulia},
 code = {eng2024EMSE-docker-compose},
 day = {03},
 doi = {https://doi.org/10.1007/s10664-024-10462-8},
 funding = {NSERC Discovery, CIHR},
 institution = {University of Alberta},
 journal = {Empirical Software Engineering},
 month = {May},
 number = {65},
 pages = {1--41},
 payurl = {https://link.springer.com/article/10.1007/s10664-024-10462-8},
 role = { Researcher / Co-author},
 title = {Patterns of multi-container composition for service orchestration with Docker Compose},
 type = {article},
 url = {http://softwareprocess.ca/pubs/eng2024EMSE-docker-compose.pdf},
 venue = {Empirical Software Engineering},
 volume = {29},
 year = {2024}
}

Generative Data by β-Variational Autoencoders Help Build Stronger Classifiers: ECG Use Case

Yousef Nademi and Sunil V Kalmady and Weijie Sun and Amir Salimi and Abram Hindle and Padma Kaul and Russell Greiner
2023 19th International Symposium on Medical Information Processing and Analysis (SIPAIM), Mexico City, Mexico
2024 1--7
Acceptance:
PDF
DOI:10.1109/sipaim56729.2023.10373478

We explore the challenge of learning models that use electrocardiogram (ECG) data to diagnose various cardiovascular diseases. Here, we explore whether classifiers trained on a dataset of real labeled ECGs, augmented with synthetic ECGs, can perform better than ones trained on unaugmented datasets. We first used a dataset of ECGs, each labelled with one or more of 15 diagnoses, from 244,077 patients to train an unsupervised $eta$-VAE model, that could generate time series of 12-lead ECG signals for each of the diagnoses. We then used this generative model to generate ECGs with the ST-segment Elevated (STE) abnormality, which we added to the public dataset of ECG abnormalities (n = 6877, over normal (Sinus Rhythm) and 8 different abnormalities) of China Physiological Signal Challenge 2018, and found a learner trained on this extended dataset performed better than one trained on only the original data on the targeted STE label but also enhanced its performance for the classification of 4 other labels.

@inproceedings{nademi2023SIPAIM-autoencoder-ECG,
 abstract = {We explore the challenge of learning models that use electrocardiogram (ECG) data to diagnose various cardiovascular diseases. Here, we explore whether classifiers trained on a dataset of real labeled ECGs, augmented with synthetic ECGs, can perform better than ones trained on unaugmented datasets. We first used a dataset of ECGs, each labelled with one or more of 15 diagnoses, from 244,077 patients to train an unsupervised $eta$-VAE model, that could generate time series of 12-lead ECG signals for each of the diagnoses. We then used this generative model to generate ECGs with the ST-segment Elevated (STE) abnormality, which we added to the public dataset of ECG abnormalities (n = 6877, over normal (Sinus Rhythm) and 8 different abnormalities) of China Physiological Signal Challenge 2018, and found a learner trained on this extended dataset performed better than one trained on only the original data on the targeted STE label but also enhanced its performance for the classification of 4 other labels.},
 accepted = {2023-09-24},
 author = {Yousef Nademi and Sunil V Kalmady and Weijie Sun and Amir Salimi and Abram Hindle  and Padma Kaul and Russell Greiner},
 authors = {Yousef Nademi and Sunil V Kalmady and Weijie Sun and Amir Salimi and Abram Hindle  and Padma Kaul and Russell Greiner},
 booktitle = {2023 19th International Symposium on Medical Information Processing and Analysis (SIPAIM)},
 code = {nademi2023SIPAIM-autoencoder-ECG},
 date = {2023-11-15},
 doi = {10.1109/sipaim56729.2023.10373478},
 funding = {NSERC Discovery},
 location = {Mexico City, Mexico},
 pages = {1--7},
 rate = {},
 role = {Co-Author},
 title = {Generative Data by β-Variational Autoencoders Help Build Stronger Classifiers: ECG Use Case},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/nademi2023SIPAIM-autoencoder-ECG.pdf},
 venue = {2023 19th International Symposium on Medical Information Processing and Analysis (SIPAIM)},
 year = {2024}
}

Development and validation of machine learning algorithms based on electrocardiograms for cardiovascular diagnoses at the population level

Sunil Vasu Kalmady and Amir Salimi and Weijie Sun and Nariman Sepehrvand and Yousef Nademi and Kevin Bainey and Justin Ezekowitz and Abram Hindle and Finlay McAlister and Russel Greiner and Roopinder Sandhu and Padma Kaul
npj Digital Medicine,
2024 1--10
PDF
DOI:https://doi.org/10.1038/s41746-024-01130-8

Artificial intelligence-enabled electrocardiogram (ECG) algorithms are gaining prominence for the early detection of cardiovascular (CV) conditions, including those not traditionally associated with conventional ECG measures or expert interpretation. This study develops and validates such models for simultaneous prediction of 15 different common CV diagnoses at the population level. We conducted a retrospective study that included 1,605,268 ECGs of 244,077 adult patients presenting to 84 emergency departments or hospitals, who underwent at least one 12-lead ECG from February 2007 to April 2020 in Alberta, Canada, and considered 15 CV diagnoses, as identified by International Classification of Diseases, 10th revision (ICD-10) codes: atrial fibrillation (AF), supraventricular tachycardia (SVT), ventricular tachycardia (VT), cardiac arrest (CA), atrioventricular block (AVB), unstable angina (UA), ST-elevation myocardial infarction (STEMI), non-STEMI (NSTEMI), pulmonary embolism (PE), hypertrophic cardiomyopathy (HCM), aortic stenosis (AS), mitral valve prolapse (MVP), mitral valve stenosis (MS), pulmonary hypertension (PHTN), and heart failure (HF). We employed ResNet-based deep learning (DL) using ECG tracings and extreme gradient boosting (XGB) using ECG measurements. When evaluated on the first ECGs per episode of 97,631 holdout patients, the DL models had an area under the receiver operating characteristic curve (AUROC) of <80% for 3 CV conditions (PTE, SVT, UA), 80–90% for 8 CV conditions (CA, NSTEMI, VT, MVP, PHTN, AS, AF, HF) and an AUROC > 90% for 4 diagnoses (AVB, HCM, MS, STEMI). DL models outperformed XGB models with about 5% higher AUROC on average. Overall, ECG-based prediction models demonstrated good-to-excellent prediction performance in diagnosing common CV conditions.

@article{kalmady2024npjdigitalmedicine-pop-level,
 abstract = {Artificial intelligence-enabled electrocardiogram (ECG) algorithms are gaining prominence for the early detection of cardiovascular (CV) conditions, including those not traditionally associated with conventional ECG measures or expert interpretation. This study develops and validates such models for simultaneous prediction of 15 different common CV diagnoses at the population level. We conducted a retrospective study that included 1,605,268 ECGs of 244,077 adult patients presenting to 84 emergency departments or hospitals, who underwent at least one 12-lead ECG from February 2007 to April 2020 in Alberta, Canada, and considered 15 CV diagnoses, as identified by International Classification of Diseases, 10th revision (ICD-10) codes: atrial fibrillation (AF), supraventricular tachycardia (SVT), ventricular tachycardia (VT), cardiac arrest (CA), atrioventricular block (AVB), unstable angina (UA), ST-elevation myocardial infarction (STEMI), non-STEMI (NSTEMI), pulmonary embolism (PE), hypertrophic cardiomyopathy (HCM), aortic stenosis (AS), mitral valve prolapse (MVP), mitral valve stenosis (MS), pulmonary hypertension (PHTN), and heart failure (HF). We employed ResNet-based deep learning (DL) using ECG tracings and extreme gradient boosting (XGB) using ECG measurements. When evaluated on the first ECGs per episode of 97,631 holdout patients, the DL models had an area under the receiver operating characteristic curve (AUROC) of <80% for 3 CV conditions (PTE, SVT, UA), 80–90% for 8 CV conditions (CA, NSTEMI, VT, MVP, PHTN, AS, AF, HF) and an AUROC > 90% for 4 diagnoses (AVB, HCM, MS, STEMI). DL models outperformed XGB models with about 5% higher AUROC on average. Overall, ECG-based prediction models demonstrated good-to-excellent prediction performance in diagnosing common CV conditions.},
 author = {Sunil Vasu Kalmady and  Amir Salimi and  Weijie Sun and  Nariman Sepehrvand and  Yousef Nademi and  Kevin Bainey and  Justin Ezekowitz and  Abram Hindle and  Finlay McAlister and  Russel Greiner and  Roopinder Sandhu and Padma Kaul},
 authors = {Sunil Vasu Kalmady and  Amir Salimi and  Weijie Sun and  Nariman Sepehrvand and  Yousef Nademi and  Kevin Bainey and  Justin Ezekowitz and  Abram Hindle and  Finlay McAlister and  Russel Greiner and  Roopinder Sandhu and Padma Kaul},
 code = {kalmady2024npjdigitalmedicine-pop-level},
 day = {18},
 doi = {https://doi.org/10.1038/s41746-024-01130-8},
 funding = {NSERC Discovery, CIHR},
 institution = {University of Alberta},
 journal = {npj Digital Medicine},
 month = {May},
 number = {133},
 pages = {1--10},
 role = { Researcher / Co-author},
 title = {Development and validation of machine learning algorithms based on electrocardiograms for cardiovascular diagnoses at the population level},
 type = {article},
 url = {http://softwareprocess.ca/pubs/kalmady2024npjdigitalmedicine-pop-level.pdf},
 venue = {npj Digital Medicine},
 volume = {7},
 year = {2024}
}

Predicting Individual Survival Distributions Using ECG: A Deep Learning Approach Utilizing Features Extracted by a Learned Diagnostic Model

Weijie Sun and Sunil Vasu Kalmady and Shi-ang Qi and Nariman Sepehrvand and Abram Hindle and Russell Greiner and Padma Kaul
Second Symposium on Survival Prediction: Algorithms, Challenges, and Applications (SPACA) AAAI, Arlington, USA
2023 1--6
Acceptance:
PDF
DOI:https://doi.org/10.1609/aaaiss.v2i1.27716

In the field of healthcare, individual survival prediction is important for personalized treatment planning. This study presents machine learning algorithms for predicting Individual Survival Distributions (ISD) using electrocardiography (ECG) data in two different formats. The models, which predict time until death, are developed and evaluated on a large, population-based cohort from Alberta, Canada. Our results demonstrate that models trained on raw ECG waveforms significantly outperform those trained on traditional ECG measurements in several metrics, including concordance index, hinge L1 loss, margin L1 loss, and margin truncated L1 loss. Additionally, the integration of predicted probabilities from wide-range diagnostic tasks not only enhances our ISD models' performance but also makes them significantly superior to other models across all evaluation metrics in individual survival prediction tasks. This innovative approach highlights the potential to leverage insights from diagnostic models for prognostic tasks, such as individual survival prediction. These findings could have far-reaching implications for the development of personalized treatment plans and open new avenues for future research in survival prediction using ECGs.

@inproceedings{sun2023AAAI-predicting,
 abstract = {In the field of healthcare, individual survival prediction is important for personalized treatment planning. This study presents machine learning algorithms for predicting Individual Survival Distributions (ISD) using electrocardiography (ECG) data in two different formats. The models, which predict time until death, are developed and evaluated on a large, population-based cohort from Alberta, Canada. Our results demonstrate that models trained on raw ECG waveforms significantly outperform those trained on traditional ECG measurements in several metrics, including concordance index, hinge L1 loss, margin L1 loss, and margin truncated L1 loss. Additionally, the integration of predicted probabilities from wide-range diagnostic tasks not only enhances our ISD models' performance but also makes them significantly superior to other models across all evaluation metrics in individual survival prediction tasks. This innovative approach highlights the potential to leverage insights from diagnostic models for prognostic tasks, such as individual survival prediction. These findings could have far-reaching implications for the development of personalized treatment plans and open new avenues for future research in survival prediction using ECGs.},
 accepted = {2023-08-23},
 author = {Weijie Sun and Sunil Vasu Kalmady and Shi-ang Qi and Nariman Sepehrvand and Abram Hindle and Russell Greiner and Padma Kaul},
 authors = {Weijie Sun and Sunil Vasu Kalmady and Shi-ang Qi and Nariman Sepehrvand and Abram Hindle and Russell Greiner and Padma Kaul},
 booktitle = { Second Symposium on Survival Prediction: Algorithms, Challenges, and Applications (SPACA) AAAI},
 code = {sun2023AAAI-predicting},
 date = {2024-01-22},
 doi = {https://doi.org/10.1609/aaaiss.v2i1.27716},
 funding = {NSERC Discovery},
 location = {Arlington, USA},
 pages = {1--6},
 rate = {},
 role = {Co-Author},
 title = {Predicting Individual Survival Distributions Using ECG: A Deep Learning Approach Utilizing Features Extracted by a Learned Diagnostic Model},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sun2023AAAI-predicting.pdf},
 venue = { Second Symposium on Survival Prediction: Algorithms, Challenges, and Applications (SPACA) AAAI},
 year = {2023}
}

Supervised Electrocardiogram (ECG) Features Outperform Knowledge-based And Unsupervised Features In Individualized Survival Prediction

Yousef Nademi and Sunil Kalmady and Weijie Sun and Shi-ang Qi and Abram Hindle and Padma Kaul and Russell Greiner
Machine Learning for Health (ML4H) 2023 @ NeurIPS, New Orleans, USA
2023 368-383
Acceptance:
PDF
DOI:

An electrocardiogram (ECG) provides crucial information about an individual’s health status. Researchers utilize ECG data to develop learners for a variety of tasks, ranging from diagnosing ECG abnormalities to estimating time to death–here modeled as individual survival distributions (ISDs). The way the ECG is represented is important for creating an effective learner. While many traditional ECG-based prediction models rely on hand-crafted features, such as heart rate, this study aims to achieve a better representation. The effectiveness of various ECG based feature extraction methods for prediction of ISDs, either supervised or unsupervised, have not been explored previously. The study uses a large ECG dataset from 244,077 patients with over 1.6 million 12-lead ECGs, each labeled with the patient {’} s disease {–} one or more International Classification of Diseases (ICD) codes. We explored extracting high-level features from ECG traces using various approaches, then trained models that used these ECG features (along with age and sex), across a range of training sizes, to estimate patient-specific ISDs. The results showed that the supervised feature extractor method produced ECG features that can estimate ISD curves better than ECG features obtained from unsupervised or knowledge-based methods. Supervised ECG features required fewer training instances (as low as 500) to learn ISD models that performed better than the baseline model that only used age and sex. On the other hand, unsupervised and knowledge-based ECG features required over 5,000 training samples to produce ISD models that performed better than the baseline. The study’s findings may assist researchers in selecting the most appropriate approach for extracting high-level features from ECG signals to estimate patient-specific ISD curves.

@inproceedings{nademi2023ML4H-supervised-ecg,
 abstract = {An electrocardiogram (ECG) provides crucial information about an individual’s health status. Researchers utilize ECG data to develop learners for a variety of tasks, ranging from diagnosing ECG abnormalities to estimating time to death–here modeled as individual survival distributions (ISDs). The way the ECG is represented is important for creating an effective learner. While many traditional ECG-based prediction models rely on hand-crafted features, such as heart rate, this study aims to achieve a better representation. The effectiveness of various ECG based feature extraction methods for prediction of ISDs, either supervised or unsupervised, have not been explored previously. The study uses a large ECG dataset from 244,077 patients with over 1.6 million 12-lead ECGs, each labeled with the patient {’} s disease {–} one or more International Classification of Diseases (ICD) codes. We explored extracting high-level features from ECG traces using various approaches, then trained models that used these ECG features (along with age and sex), across a range of training sizes, to estimate patient-specific ISDs. The results showed that the supervised feature extractor method produced ECG features that can estimate ISD curves better than ECG features obtained from unsupervised or knowledge-based methods. Supervised ECG features required fewer training instances (as low as 500) to learn ISD models that performed better than the baseline model that only used age and sex. On the other hand, unsupervised and knowledge-based ECG features required over 5,000 training samples to produce ISD models that performed better than the baseline. The study’s findings may assist researchers in selecting the most appropriate approach for extracting high-level features from ECG signals to estimate patient-specific ISD curves.},
 accepted = {2023-11-01},
 author = {Yousef Nademi and Sunil Kalmady and Weijie Sun and Shi-ang Qi and Abram Hindle and Padma Kaul and Russell Greiner},
 authors = {Yousef Nademi and Sunil Kalmady and Weijie Sun and Shi-ang Qi and Abram Hindle and Padma Kaul and Russell Greiner},
 booktitle = {Machine Learning for Health (ML4H) 2023 @ NeurIPS},
 code = {nademi2023ML4H-supervised-ecg},
 date = {2023-12-01},
 doi = {},
 funding = {NSERC Discovery},
 location = {New Orleans, USA},
 pagerange = {368-383},
 pages = {368-383},
 rate = {},
 role = {Co-Author},
 title = {Supervised Electrocardiogram (ECG) Features Outperform Knowledge-based And Unsupervised Features In Individualized Survival Prediction},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/nademi2023ML4H-supervised-ecg.pdf},
 venue = {Machine Learning for Health (ML4H) 2023 @ NeurIPS},
 year = {2023}
}

Identifying Defect-Inducing Changes in Visual Code

Kalvin Eng and Abram Hindle and Alexander Senchenko
2023 IEEE International Conference on Software Maintenance and Evolution (ICSME) Industry Track, Bogotá, Colombia
2023 474-484
Acceptance:
PDF
Publisher Link
DOI:10.1109/ICSME58846.2023.00061

Defects, or bugs, often form during software development. Identifying the root cause of defects is essential to improve code quality, evaluate testing methods, and support defect prediction. Examples of defect-inducing changes can be found using the SZZ algorithm to trace the textual history of defect-fixing changes back to the defect-inducing changes that they fix in line-based code. The line-based approach of the SZZ method is ineffective for visual code that represents source code graphically rather than textually. In this paper we adapt SZZ for visual code and present the SZZ Visual Code (SZZ-VC) algorithm, that finds changes in visual code based on the differences of graphical elements rather than differences of lines to detect defect-inducing changes. We validated the algorithm for an industry-made AAA video game and 20 music visual programming defects across 12 open source projects. Our results show that SZZ-VC is feasible for detecting defects in visual code for 3 different visual programming languages.

@inproceedings{eng2023ICSME-SZZ-visual-code,
 abstract = {Defects, or bugs, often form during software development. Identifying the root cause of defects is essential to improve code quality, evaluate testing methods, and support defect prediction. Examples of defect-inducing changes can be found using the SZZ algorithm to trace the textual history of defect-fixing changes back to the defect-inducing changes that they fix in line-based code. The line-based approach of the SZZ method is ineffective for visual code that represents source code graphically rather than textually. In this paper we adapt SZZ for visual code and present the SZZ Visual Code (SZZ-VC) algorithm, that finds changes in visual code based on the differences of graphical elements rather than differences of lines to detect defect-inducing changes. We validated the algorithm for an industry-made AAA video game and 20 music visual programming defects across 12 open source projects. Our results show that SZZ-VC is feasible for detecting defects in visual code for 3 different visual programming languages.},
 accepted = {2023-08-10},
 author = {Kalvin Eng and Abram Hindle and Alexander Senchenko},
 authors = {Kalvin Eng and Abram Hindle and Alexander Senchenko},
 booktitle = {2023 IEEE International Conference on Software Maintenance and Evolution (ICSME) Industry Track},
 code = {eng2023ICSME-SZZ-visual-code},
 date = {2023-10-01},
 doi = {10.1109/ICSME58846.2023.00061},
 funding = {NSERC Discovery},
 location = {Bogotá, Colombia},
 pagerange = {474-484},
 pages = {474-484},
 payurl = {https://doi.ieeecomputersociety.org/10.1109/ICSME58846.2023.00061},
 rate = {},
 role = {Co-Author},
 title = {Identifying Defect-Inducing Changes in Visual Code},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/eng2023ICSME-SZZ-visual-code.pdf},
 venue = {2023 IEEE International Conference on Software Maintenance and Evolution (ICSME) Industry Track},
 year = {2023}
}

Predicting Defective Visual Code Changes in a Multi-Language AAA Video Game Project

Kalvin Eng and Abram Hindle and Alexander Senchenko
2023 IEEE International Conference on Software Maintenance and Evolution (ICSME) Industry Track, Bogotá, Colombia
2023 485-494
Acceptance:
PDF
DOI:10.1109/ICSME58846.2023.00062

Video game development increasingly relies on using visual programming languages as the primary way to build video game features. The aim of using visual programming is to move game logic into the hands of game designers, who may not be as well versed in textual coding. In this paper, we empirically observe that there are more defect-inducing commits containing visual code than textual code in a AAA video game project codebase. This indicates that the existing textual code Just-in-Time (JIT) defect prediction models under evaluation by Electronic Arts (EA) may be ineffective as they do not account for changes in visual code. Thus, we focus our research on constructing visual code defect prediction models that encompass visual code metrics and evaluate the models against defect prediction models that use language agnostic features, and textual code metrics. We test our models using features extracted from the historical codebase of a AAA video game project, as well as the historical codebases of 70 open source projects that use textual and visual code. We find that defect prediction models have better performance overall in terms of the area under the ROC curve (AUC), and Mathews Correlation Coefficient (MCC) when incorporating visual code features for projects that contain more commits with visual code than textual code.

@inproceedings{eng2023ICSME-defect-visual-code,
 abstract = {Video game development increasingly relies on using visual programming languages as the primary way to build video game features. The aim of using visual programming is to move game logic into the hands of game designers, who may not be as well versed in textual coding. In this paper, we empirically observe that there are more defect-inducing commits containing visual code than textual code in a AAA video game project codebase. This indicates that the existing textual code Just-in-Time (JIT) defect prediction models under evaluation by Electronic Arts (EA) may be ineffective as they do not account for changes in visual code. Thus, we focus our research on constructing visual code defect prediction models that encompass visual code metrics and evaluate the models against defect prediction models that use language agnostic features, and textual code metrics. We test our models using features extracted from the historical codebase of a AAA video game project, as well as the historical codebases of 70 open source projects that use textual and visual code. We find that defect prediction models have better performance overall in terms of the area under the ROC curve (AUC), and Mathews Correlation Coefficient (MCC) when incorporating visual code features for projects that contain more commits with visual code than textual code.},
 accepted = {2023-08-10},
 author = {Kalvin Eng and Abram Hindle and Alexander Senchenko},
 authors = {Kalvin Eng and Abram Hindle and Alexander Senchenko},
 booktitle = {2023 IEEE International Conference on Software Maintenance and Evolution (ICSME) Industry Track},
 code = {eng2023ICSME-defect-visual-code},
 date = {2023-10-01},
 doi = {10.1109/ICSME58846.2023.00062},
 funding = {NSERC Discovery},
 location = {Bogotá, Colombia},
 pagerange = {485-494},
 pages = {485-494},
 rate = {},
 role = {Co-Author},
 title = {Predicting Defective Visual Code Changes in a Multi-Language AAA Video Game Project},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/eng2023ICSME-defect-visual-code.pdf},
 venue = {2023 IEEE International Conference on Software Maintenance and Evolution (ICSME) Industry Track},
 year = {2023}
}

Energy Consumption Estimation of API-usage in Smartphone Apps via Static Analysis

Abdul Ali Bangash and Kalvin Eng and Qasim Jamal and Karim Ali and Abram Hindle
2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), Melbourne, Australia
2023 272--283
Acceptance:37%
PDF

Smartphone application (app) developers measure the energy consumption of their apps to ensure that they do not consume excessive energy. However, existing techniques require developers to generate and execute test cases on expensive, sophisticated hardware. To address these challenges, we propose a static-analysis approach that estimates the energy consumption of API usage in an app, eliminating the need for test case execution. To instantiate our approach, we have profiled the energy consumption of the Swift SQLite API operations. Given a Swift app, we first scan it for uses of SQLite. We then combine that information with the measured energy profile to compute E-factor, an estimate of the energy consumption of the API usage in an app. To evaluate the usability of E-factor, we have calculated the E-factor of 56 real-world iOS apps. We have also compared the E-factor of 16 versions and 11 methods from 3 of those apps to their hardware-based energy measurements. Our findings show that E-factor positively correlates with the hardware-based energy measurements, indicating that E-factor is a practical estimate to compare the energy consumption difference in API usage across different versions of an app. Developers may also use E-factor to identify excessive energy-consuming methods in their apps and focus on optimizing them. Our approach is most useful in an Integrated Development Environment (IDE) or Continuous Integration (CI) pipeline, where developers receive energy consumption insights within milliseconds of making a code modification.

@inproceedings{bangash2023MSR-static-energy,
 abstract = {Smartphone application (app) developers measure the energy consumption of their apps to ensure that they do not consume excessive energy. However, existing techniques require developers to generate and execute test cases on expensive, sophisticated hardware. To address these challenges, we propose a static-analysis approach that estimates the energy consumption of API usage in an app, eliminating the need for test case execution. To instantiate our approach, we have profiled the energy consumption of the Swift SQLite API operations. Given a Swift app, we first scan it for uses of SQLite. We then combine that information with the measured energy profile to compute E-factor, an estimate of the energy consumption of the API usage in an app. To evaluate the usability of E-factor, we have calculated the E-factor of 56 real-world iOS apps. We have also compared the E-factor of 16 versions and 11 methods from 3 of those apps to their hardware-based energy measurements. Our findings show that E-factor positively correlates with the hardware-based energy measurements, indicating that E-factor is a practical estimate to compare the energy consumption difference in API usage across different versions of an app. Developers may also use E-factor to identify excessive energy-consuming methods in their apps and focus on optimizing them. Our approach is most useful in an Integrated Development Environment (IDE) or Continuous Integration (CI) pipeline, where developers receive energy consumption insights within milliseconds of making a code modification.},
 accepted = {2023-03-07},
 author = {Abdul Ali Bangash and Kalvin Eng and Qasim Jamal and Karim Ali and Abram Hindle},
 authors = {Abdul Ali Bangash and Kalvin Eng and Qasim Jamal and Karim Ali and Abram Hindle},
 booktitle = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)},
 code = {bangash2023MSR-static-energy},
 date = {2023-05-15},
 funding = {NSERC Discovery},
 location = {Melbourne, Australia},
 pagerange = {272--283},
 pages = {272--283},
 rate = {37%},
 role = {Co-Author},
 title = {Energy Consumption Estimation of API-usage in Smartphone Apps via Static Analysis},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/bangash2023MSR-static-energy.pdf},
 venue = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)},
 year = {2023}
}

Evolution of the Practice of Software Testing in Java Projects

Anisha Islam and Nipuni Tharushika Hewage and Abdul Ali Bangash and Abram Hindle
2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): Challenge Track, Melbourne, Australia
2023 367--371
Acceptance:50%
PDF

Software testing helps developers minimize bugs and errors in their code, improving the overall software quality. In 2013, Kochhar et al. analyzed 20,817 software projects in order to study how prevalent the practice of software testing is in open-source projects. They found that projects with more lines of code (LOC) and projects with more developers tend to have more test cases. Additionally, they found a weak positive correlation between the number of test cases and the number of bugs. Since the conclusions of a study might become irrelevant over time because of the latest practices in the relevant fields, in this paper, we investigate if these conclusions remain valid if we re-evaluate Kochhar et al.’s findings on the Java projects that were developed from 2012 to 2021. For evaluation, we use a random sample of 20,000 open-source Java projects each year. Our results show that Kochhar et al.’s conclusions regarding the projects with test cases having more LOC, the weak positive correlation between the number of test cases and authors, and the weak positive correlation between the number of test cases and bugs remain stable until 2021. Our study corroborates Kochhar et al.’s conclusions and helps developers refocus in light of the latest findings regarding the practice of software testing.

@inproceedings{islam2023MSR-java-testing,
 abstract = {Software testing helps developers minimize bugs and errors in their code, improving the overall software quality. In 2013, Kochhar et al. analyzed 20,817 software projects in order to study how prevalent the practice of software testing is in open-source projects. They found that projects with more lines of code (LOC) and projects with more developers tend to have more test cases. Additionally, they found a weak positive correlation between the number of test cases and the number of bugs. Since the conclusions of a study might become irrelevant over time because of the latest practices in the relevant fields, in this paper, we investigate if these conclusions remain valid if we re-evaluate Kochhar et al.’s findings on the Java projects that were developed from 2012 to 2021. For evaluation, we use a random sample of 20,000 open-source Java projects each year. Our results show that Kochhar et al.’s conclusions regarding the projects with test cases having more LOC, the weak positive correlation between the number of test cases and authors, and the weak positive correlation between the number of test cases and bugs remain stable until 2021. Our study corroborates Kochhar et al.’s conclusions and helps developers refocus in light of the latest findings regarding the practice of software testing.},
 accepted = {2023-03-07},
 author = {Anisha Islam and Nipuni Tharushika Hewage  and Abdul Ali Bangash and Abram Hindle},
 authors = {Anisha Islam and Nipuni Tharushika Hewage  and Abdul Ali Bangash and Abram Hindle},
 booktitle = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): Challenge Track},
 code = {islam2023MSR-java-testing},
 date = {2023-05-15},
 funding = {NSERC Discovery},
 location = {Melbourne, Australia},
 pagerange = {367--371},
 pages = {367--371},
 rate = {50%},
 role = {Co-Author},
 title = {Evolution of the Practice of Software Testing in Java Projects},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/islam2023MSR-java-testing.pdf},
 venue = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): Challenge Track},
 year = {2023}
}

An Empirical Study to Investigate Collaboration Among Developers in Open Source Software (OSS)

Weijie Sun and Samuel Iwuchukwu and Abdul Ali Bangash and Abram Hindle
2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): Challenge Track, Melbourne, Australia
2023 352--356
Acceptance:50%
PDF

The value of teamwork is being recognized by project owners, resulting in an increased acknowledgement of collaboration among developers in software engineering. A good understanding of how developers work together could positively impact software development practices. In this paper, we investigate the collaboration habits of developers in project files by leveraging the World of Code (WoC) dataset and GitHub API. We first identify the collaboration level of developers within the project files, such as the source, test, documentation, and build files, using the Author Cross Entropy (ACE). From the results we find out that test files report the highest degree of collaboration among the developers, perhaps because collaboration is critical to ensure convergence of functionality tests. Furthermore, the source code files show the least degree of collaboration, perhaps because of code ownership and the complexity and difficulty in code modification. Secondly, given the widespread usage of the Python programming language, we investigate the Python code tokens that are more prone to change and collaboration. Our findings offer insights into the specific project files and Python code tokens that developers typically collaborate on in the open-source community. This information can be used by researchers and developers to enhance existing collaboration platforms and tools.

@inproceedings{sun2023MSR-author-cross-entropy,
 abstract = {The value of teamwork is being recognized by project owners, resulting in an increased acknowledgement of collaboration among developers in software engineering. A good understanding of how developers work together could positively impact software development practices. In this paper, we investigate the collaboration habits of developers in project files by leveraging the World of Code (WoC) dataset and GitHub API. We first identify the collaboration level of developers within the project files, such as the source, test, documentation, and build files, using the Author Cross Entropy (ACE). From the results we find out that test files report the highest degree of collaboration among the developers, perhaps because collaboration is critical to ensure convergence of functionality tests. Furthermore, the source code files show the least degree of collaboration, perhaps because of code ownership and the complexity and difficulty in code modification. Secondly, given the widespread usage of the Python programming language, we investigate the Python code tokens that are more prone to change and collaboration. Our findings offer insights into the specific project files and Python code tokens that developers typically collaborate on in the open-source community. This information can be used by researchers and developers to enhance existing collaboration platforms and tools.},
 accepted = {2023-03-07},
 author = {Weijie Sun and Samuel Iwuchukwu and Abdul Ali Bangash and Abram Hindle},
 authors = {Weijie Sun and Samuel Iwuchukwu and Abdul Ali Bangash and Abram Hindle},
 booktitle = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): Challenge Track},
 code = {sun2023MSR-author-cross-entropy},
 date = {2023-05-15},
 funding = {NSERC Discovery},
 location = {Melbourne, Australia},
 pagerange = {352--356},
 pages = {352--356},
 rate = {50%},
 role = {Co-Author},
 title = {An Empirical Study to Investigate Collaboration Among Developers in Open Source Software (OSS)},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sun2023MSR-author-cross-entropy.pdf},
 venue = {2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR): Challenge Track},
 year = {2023}
}

Towards artificial intelligence-based learning health system for population-level mortality prediction using electrocardiograms

Weijie Sun and Sunil Vasu Kalmady and Nariman Sepehrvand and Amir Salimi and Yousef Nademi and Kevin Bainey and Justin A. Ezekowitz and Russell Greiner and Abram Hindle and Finlay A. McAlister and Roopinder K. Sandhu and Padma Kaul
npj Digital Medicine,
2023 1--12
PDF
DOI:https://doi.org/10.1038/s41746-023-00765-3

The feasibility and value of linking electrocardiogram (ECG) data to longitudinal population-level administrative health data to facilitate the development of a learning healthcare system has not been fully explored. We developed ECG-based machine learning models to predict risk of mortality among patients presenting to an emergency department or hospital for any reason. Using the 12-lead ECG traces and measurements from 1,605,268 ECGs from 748,773 healthcare episodes of 244,077 patients (2007–2020) in Alberta, Canada, we developed and validated ResNet-based Deep Learning (DL) and gradient boosting-based XGBoost (XGB) models to predict 30-day, 1-year, and 5-year mortality. The models for 30-day, 1-year, and 5-year mortality were trained on 146,173, 141,072, and 111,020 patients and evaluated on 97,144, 89,379, and 55,650 patients, respectively. In the evaluation cohort, 7.6%, 17.3%, and 32.9% patients died by 30-days, 1-year, and 5-years, respectively. ResNet models based on ECG traces alone had good-to-excellent performance with area under receiver operating characteristic curve (AUROC) of 0.843 (95% CI: 0.838–0.848), 0.812 (0.808–0.816), and 0.798 (0.792–0.803) for 30-day, 1-year and 5-year prediction, respectively; and were superior to XGB models based on ECG measurements with AUROC of 0.782 (0.776–0.789), 0.784 (0.780–0.788), and 0.746 (0.740–0.751). This study demonstrates the validity of ECG-based DL mortality prediction models at the population-level that can be leveraged for prognostication at point of care.

@article{sun2023npjdigitalmedicine-pop-ecg,
 abstract = {The feasibility and value of linking electrocardiogram (ECG) data to longitudinal population-level administrative health data to facilitate the development of a learning healthcare system has not been fully explored. We developed ECG-based machine learning models to predict risk of mortality among patients presenting to an emergency department or hospital for any reason. Using the 12-lead ECG traces and measurements from 1,605,268 ECGs from 748,773 healthcare episodes of 244,077 patients (2007–2020) in Alberta, Canada, we developed and validated ResNet-based Deep Learning (DL) and gradient boosting-based XGBoost (XGB) models to predict 30-day, 1-year, and 5-year mortality. The models for 30-day, 1-year, and 5-year mortality were trained on 146,173, 141,072, and 111,020 patients and evaluated on 97,144, 89,379, and 55,650 patients, respectively. In the evaluation cohort, 7.6%, 17.3%, and 32.9% patients died by 30-days, 1-year, and 5-years, respectively. ResNet models based on ECG traces alone had good-to-excellent performance with area under receiver operating characteristic curve (AUROC) of 0.843 (95% CI: 0.838–0.848), 0.812 (0.808–0.816), and 0.798 (0.792–0.803) for 30-day, 1-year and 5-year prediction, respectively; and were superior to XGB models based on ECG measurements with AUROC of 0.782 (0.776–0.789), 0.784 (0.780–0.788), and 0.746 (0.740–0.751). This study demonstrates the validity of ECG-based DL mortality prediction models at the population-level that can be leveraged for prognostication at point of care.},
 author = {Weijie Sun and Sunil Vasu Kalmady and Nariman Sepehrvand and Amir Salimi and Yousef Nademi and Kevin Bainey and Justin A. Ezekowitz and Russell Greiner and Abram Hindle and Finlay A. McAlister and Roopinder K. Sandhu and Padma Kaul},
 authors = {Weijie Sun and Sunil Vasu Kalmady and Nariman Sepehrvand and Amir Salimi and Yousef Nademi and Kevin Bainey and Justin A. Ezekowitz and Russell Greiner and Abram Hindle and Finlay A. McAlister and Roopinder K. Sandhu and Padma Kaul},
 code = {sun2023npjdigitalmedicine-pop-ecg},
 day = {06},
 doi = {https://doi.org/10.1038/s41746-023-00765-3},
 funding = {NSERC Discovery, CIHR},
 institution = {University of Alberta},
 journal = {npj Digital Medicine},
 month = {February},
 number = {1},
 pages = {1--12},
 role = { Researcher / Co-author},
 title = {Towards artificial intelligence-based learning health system for population-level mortality prediction using electrocardiograms},
 type = {article},
 url = {http://softwareprocess.ca/pubs/sun2023npjdigitalmedicine-pop-ecg.pdf},
 venue = {npj Digital Medicine},
 volume = {6},
 year = {2023}
}

Analyzing Techniques for Duplicate Question Detection on Q\&A Websites for Game Developers

Arthur V. Kamienski and Abram Hindle and Cor-Paul Bezemer
Empirical Software Engineering Journal (EMSE),
2022 1--46
PDF

Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.

@article{kamienski2022EMSE-dupe-question-gamedev,
 abstract = {Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.},
 accepted = {2022-11-02},
 author = {Arthur V. Kamienski and Abram Hindle and Cor-Paul Bezemer},
 authors = {Arthur V. Kamienski and Abram Hindle and Cor-Paul Bezemer},
 code = {kamienski2022EMSE-dupe-question-gamedev},
 day = {08},
 funding = {NSERC Discovery},
 institution = {University of Alberta},
 journal = {Empirical Software Engineering Journal (EMSE)},
 month = {December},
 number = {17},
 pages = {1--46},
 role = { Researcher / Co-author},
 title = {Analyzing Techniques for Duplicate Question Detection on Q\&A Websites for Game Developers},
 type = {article},
 url = {http://softwareprocess.ca/pubs/kamienski2022EMSE-dupe-question-gamedev.pdf},
 venue = {Empirical Software Engineering Journal (EMSE)},
 volume = {28},
 year = {2022}
}

A Black Box Technique to Reduce Energy Consumption of Android Apps

Abdul Ali Bangash, Karim Ali, Abram Hindle
ICSE-NIER'22, Pittsburgh, United States
2022 1--5
Acceptance:26.8%
PDF

Android byte-code transformations are used to optimize applications (apps) in terms of run-time performance and size. But do they affect the energy consumption during this process? If they do, can we employ them to reduce an app’s energy consumption? Given that most existing energy optimization techniques require developers to modify their code, a byte-code level modification technique will save developers’ time and effort. In this paper, we investigate if byte-code transformations combined with genetic search can reduce an app’s energy consumption. After applying our technique on four real-world apps, we find that some combinations of the byte-code transformations reduce the energy consumption by up to 11%.

@inproceedings{bangash2022ICSENIER-blackbox-energy,
 abstract = {Android byte-code transformations are used to optimize applications (apps) in terms of run-time performance and size. But do they affect the energy consumption during this process? If they do, can we employ them to reduce an app’s energy consumption? Given that most existing energy optimization techniques require developers to modify their code, a byte-code level modification technique will save developers’ time and effort. In this paper, we investigate if byte-code transformations combined with genetic search can reduce an app’s energy consumption. After applying our technique on four real-world apps, we find that some combinations of the byte-code transformations reduce the energy consumption by up to 11%.},
 accepted = {2021-12-30},
 author = {Abdul Ali Bangash and Karim Ali and Abram Hindle},
 authors = {Abdul Ali Bangash, Karim Ali, Abram Hindle},
 booktitle = {ICSE-NIER'22},
 code = {bangash2022ICSENIER-blackbox-energy},
 data = {https://github.com/AbdulAli/EnergyDataset-ICSE-NIER22},
 date = {2022-05-21},
 funding = {NSERC Discovery},
 location = {Pittsburgh, United States},
 pagerange = {1--5},
 pages = {1--5},
 rate = {26.8%},
 role = {Co-Author},
 title = {A Black Box Technique to Reduce Energy Consumption of Android Apps},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/bangash2022ICSENIER-blackbox-energy.pdf},
 venue = {ICSE-NIER'22},
 year = {2022}
}

Improving ECG-based COVID-19 diagnosis and mortality predictions using pre-pandemic medical records at population-scale

Weijie Sun and Sunil Vasu Kalmady and Zihan Wang and Amir Salimi and Nariman Sepehrvand and Abram Hindle and Luan Manh Chu and Russell Greiner and Padma Kaul
NeurIPS TS4H: Timeseries for Health, New Orleans, United States
2022 1--9
Acceptance:Unknown
PDF

Pandemic outbreaks such as COVID-19 occur unexpectedly, and need immediate action due to their potential devastating consequences on global health. Point-of-care routine assessments such as electrocardiogram (ECG), can be used to develop prediction models for identifying individuals at risk. However, there is often too little clinically-annotated medical data, especially in early phases of a pandemic, to develop accurate prediction models. In such situations, historical pre-pandemic health records can be utilized to estimate a preliminary model, which can then be fine-tuned based on limited available pandemic data. This study shows this approach – pre-train deep learning models with pre-pandemic data – can work effectively, by demonstrating substantial performance improvement over three different COVID-19 related diagnostic and prognostic prediction tasks. Similar transfer learning strategies can be useful for developing timely artificial intelligence solutions in future pandemic outbreaks.

@inproceedings{sun2022TS4H-improving-ecg-covid,
 abstract = {Pandemic outbreaks such as COVID-19 occur unexpectedly, and need immediate action due to their potential devastating consequences on global health. Point-of-care routine assessments such as electrocardiogram (ECG), can be used to develop prediction models for identifying individuals at risk. However, there is often too little clinically-annotated medical data, especially in early phases of a pandemic, to develop accurate prediction models. In such situations, historical pre-pandemic health records can be utilized to estimate a preliminary model, which can then be fine-tuned based on limited available pandemic data. This study shows this approach – pre-train deep learning models with pre-pandemic data – can work effectively, by demonstrating substantial performance improvement over three different COVID-19 related diagnostic and prognostic prediction tasks. Similar transfer learning strategies can be useful for developing timely artificial intelligence solutions in future pandemic outbreaks.},
 author = {Weijie Sun and Sunil Vasu Kalmady and Zihan Wang and Amir Salimi and Nariman Sepehrvand and Abram Hindle and Luan Manh Chu and Russell Greiner and Padma Kaul},
 authors = {Weijie Sun and Sunil Vasu Kalmady and Zihan Wang and Amir Salimi and Nariman Sepehrvand and Abram Hindle and Luan Manh Chu and Russell Greiner and Padma Kaul},
 booktitle = {NeurIPS TS4H: Timeseries for Health},
 code = {sun2022TS4H-improving-ecg-covid},
 funding = {NSERC Discovery},
 location = {New Orleans, United States},
 pages = {1--9},
 rate = {Unknown},
 role = {Co-Author},
 title = {Improving ECG-based COVID-19 diagnosis and mortality predictions using pre-pandemic medical records at population-scale},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sun2022TS4H-improving-ecg-covid.pdf},
 venue = {NeurIPS TS4H: Timeseries for Health},
 year = {2022}
}

Dorabella Cipher as Musical Inspiration

Bradley Hauer, Colin Choi, Abram Hindle, Scott Smallwood, Grzegorz Kondrak
Speech and Music workshop at ICON, India
2021 33--38
Acceptance:Unknown
PDF

The Dorabella cipher is an encrypted note written by English composer Edward Elgar, which has defied decipherment attempts for more than a century. While most proposed solutions are English texts, we investigate the hypothesis that Dorabella represents enciphered music. We weigh the evidence for and against the hypothesis, devise a simplified music notation, and attempt to reconstruct a melody from the cipher. Our tools are n-gram models of music which we validate on existing music cor- pora enciphered using monoalphabetic substitution. By applying our methods to Dorabella, we produce a decipherment with musical qualities, which is then transformed via artful composition into a listenable melody. Far from arguing that the end result represents the only true solution, we instead frame the process of decipherment as part of the composition process.

@inproceedings{hauer2021SMP-dorabella-inspire,
 abstract = {The Dorabella cipher is an encrypted note written by English composer Edward Elgar, which has defied decipherment attempts for more than a century. While most proposed solutions are English texts, we investigate the hypothesis that Dorabella represents enciphered music. We weigh the evidence for and against the hypothesis, devise a simplified music notation, and attempt to reconstruct a melody from the cipher. Our tools are n-gram models of music which we validate on existing music cor- pora enciphered using monoalphabetic substitution. By applying our methods to Dorabella, we produce a decipherment with musical qualities, which is then transformed via artful composition into a listenable melody. Far from arguing that the end result represents the only true solution, we instead frame the process of decipherment as part of the composition process.},
 accepted = {2021-11-30},
 author = {Bradley Hauer and Colin Choi and Abram Hindle and Scott Smallwood and Grzegorz Kondrak},
 authors = {Bradley Hauer, Colin Choi, Abram Hindle, Scott Smallwood, Grzegorz Kondrak},
 booktitle = {Speech and Music workshop at ICON},
 code = {hauer2021SMP-dorabella-inspire},
 data = {https://zenodo.org/record/4764819},
 date = {2021-12-16},
 funding = {NSERC Discovery},
 location = {India},
 pagerange = {33--38},
 pages = {33--38},
 rate = {Unknown},
 role = {Co-Author},
 title = {Dorabella Cipher as Musical Inspiration},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hauer2021SMP-dorabella-inspire.pdf},
 venue = {Speech and Music workshop at ICON},
 video = {https://www.youtube.com/watch?v=lmAQwTXSUHQ},
 year = {2021}
}

ECG for high-throughput screening of multiple diseases: Proof-of-concept using multi-diagnosis deep learning from population-based datasets

Weijie Sun, Sunil Vasu Kalmady, Amir S Salimi, Nariman Sepehrvand, Eric Ly, Abram Hindle, Russell Greiner, Padma Kaul
Medical Imaging Workshop at NeurIPS, Online
2021 1--6
Acceptance:56/90
PDF

Electrocardiogram (ECG) abnormalities are linked to cardiovascular diseases, but may also occur in other non-cardiovascular conditions such as mental, neurological, metabolic and infectious conditions. However, most of the recent success of deep learning (DL) based diagnostic predictions in selected patient cohorts have been limited to a small set of cardiac diseases. In this study, we use a population-based dataset of >250,000 patients with >1000 medical conditions and >2 million ECGs to identify a wide range of diseases that could be accurately diagnosed from the patient’s first in-hospital ECG. Our DL models uncovered 128 diseases and 68 disease categories with strong discriminative performance.

@inproceedings{sun2021NEURIPS-ECG-screening,
 abstract = {Electrocardiogram (ECG) abnormalities are linked to cardiovascular diseases, but may also occur in other non-cardiovascular conditions such as mental, neurological, metabolic and infectious conditions. However, most of the recent success of deep learning (DL) based diagnostic predictions in selected patient cohorts have been limited to a small set of cardiac diseases. In this study, we use a population-based dataset of >250,000 patients with >1000 medical conditions and >2 million ECGs to identify a wide range of diseases that could be accurately diagnosed from the patient’s first in-hospital ECG. Our DL models uncovered 128 diseases and 68 disease categories with strong discriminative performance.},
 accepted = {2021-10-26},
 author = {Weijie Sun and Sunil Vasu Kalmady and Amir S Salimi and Nariman Sepehrvand and Eric Ly and Abram Hindle and Russell Greiner and Padma Kaul},
 authors = {Weijie Sun, Sunil Vasu Kalmady, Amir S Salimi, Nariman Sepehrvand, Eric Ly, Abram Hindle, Russell Greiner, Padma Kaul},
 booktitle = {Medical Imaging Workshop at NeurIPS},
 code = {sun2021NEURIPS-ECG-screening},
 date = {2021-12-14},
 funding = {NSERC Discovery and CVC},
 location = {Online},
 pagerange = {1--6},
 pages = {1--6},
 rate = {56/90},
 role = {Co-Author},
 title = {ECG for high-throughput screening of multiple diseases: Proof-of-concept using multi-diagnosis deep learning from population-based datasets},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sun2021NEURIPS-ECG-screening.pdf},
 venue = {Medical Imaging Workshop at NeurIPS},
 year = {2021}
}

Experimental Analysis of the Dorabella Cipher with Statistical Language Models

Bradley Hauer, Colin Choi, Anirudh Sundar, Abram Hindle, Scott Smallwood, Grzegorz Kondrak
The International Conference on Historical Cryptology (HistoCrypt 2021),
2021 1--10
PDF

The Dorabella cipher is a symbolic message written in 1897 by English composer Edward Elgar. We analyze the cipher using modern computational and statistical techniques. We consider several open questions: Is the underlying message natural language text or music? If it is language, what is the most likely language? Is Dorabella a simple substitution cipher? If so, why has nobody managed to produce a plausible decipherment? Are some unusual-looking patterns in the cipher likely to occur by chance? Can state-of-the-art algorithmic solvers decipher at least some words of the message? This work is intended as a contribution towards finding answers to these questions.

@inproceedings{hauer2021HistoCrypt-dorabella,
 abstract = {The Dorabella cipher is a symbolic message written in 1897 by English composer Edward Elgar. We analyze the cipher using modern computational and statistical techniques. We consider several open questions: Is the underlying message natural language text or music? If it is language, what is the most likely language? Is Dorabella a simple substitution cipher? If so, why has nobody managed to produce a plausible decipherment? Are some unusual-looking patterns in the cipher likely to occur by chance? Can state-of-the-art algorithmic solvers decipher at least some words of the message? This work is intended as a contribution towards finding answers to these questions.},
 accepted = {2021-04-23},
 author = {Bradley Hauer and Colin Choi and Anirudh Sundar and Abram Hindle and Scott Smallwood and Grzegorz Kondrak},
 authors = {Bradley Hauer, Colin Choi, Anirudh Sundar, Abram Hindle, Scott Smallwood, Grzegorz Kondrak},
 booktitle = {The International Conference on Historical Cryptology (HistoCrypt 2021)},
 code = {hauer2021HistoCrypt-dorabella},
 date = {2021-09-20},
 funding = {NSERC Discovery},
 pagerange = {1--10},
 pages = {1--10},
 role = {Co-author},
 title = {Experimental Analysis of the Dorabella Cipher with Statistical Language Models},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hauer2021HistoCrypt-dorabella.pdf},
 venue = {The International Conference on Historical Cryptology (HistoCrypt 2021)},
 year = {2021}
}

Energy Efficient Guidelines for iOS Core Location Framework

Abdul Ali Bangash, Daniil Tiganov, Karim Ali, Abram Hindle
Proceedings of the 2021 International Conference on Software Maintenance and Evolution (ICSME), Luxembourg City, Luxembourg
2021 1--12
Acceptance:24%
PDF

Several types of apps require accessing user location, including map navigation, food ordering, and fitness tracking apps. To access user location, app developers use frameworks that the underlying platform provides to them. For the iOS platform, the Core Location framework enables developers to configure various services to obtain user location information. But how does a particular configuration affect the energy consumption of an app? The available Core Location framework documentation is insufficient to help developers reason about the tradeoff between choosing a particular configuration and energy consumption. In this paper, we present a set of guidelines that will help developers make an energy-efficient design choice while configuring the Core Location framework for their app. To achieve that, we have created microbenchmark configurations of the various services that the Core Location framework provides. We have then run several test-scenarios on these configurations to extract their energy profiles. To extract energy-efficient guidelines for developers, we have carefully examined those energy profile results. The guidelines show several configurations that not only reduce energy consumption but also access locations more frequently than other configurations. To evaluate those guidelines, we analyzed three real-world apps and a location service sample app provided by Apple. Our results show that the guidelines help reduce energy: 0.42% for a property search app, 10.59% for a weather app, 26.91% for a location utility app, and 11.37% for Apple’s sample app. Additionally, our empirical evaluation shows that choosing an energy-hungry configuration can increase the energy consumption by up to a maximum of 23.97%. Our guidelines are effective on 3 real-world apps, and our methodology may be used to extract energy-efficient guidelines for frameworks other than the Core Location framework.

@inproceedings{bangash2021ICSME-igreenminer,
 abstract = {Several types of apps require accessing user location, including map navigation, food ordering, and fitness tracking apps. To access user location, app developers use frameworks that the underlying platform provides to them. For the iOS platform, the Core Location framework enables developers to configure various services to obtain user location information. But how does a particular configuration affect the energy consumption of an app? The available Core Location framework documentation is insufficient to help developers reason about the tradeoff between choosing a particular configuration and energy consumption.  In this paper, we present a set of guidelines that will help developers make an energy-efficient design choice while configuring the Core Location framework for their app. To achieve that, we have created microbenchmark configurations of the various services that the Core Location framework provides. We have then run several test-scenarios on these configurations to extract their energy profiles. To extract energy-efficient guidelines for developers, we have carefully examined those energy profile results. The guidelines show several configurations that not only reduce energy consumption but also access locations more frequently than other configurations. To evaluate those guidelines, we analyzed three real-world apps and a location service sample app provided by Apple. Our results show that the guidelines help reduce energy: 0.42% for a property search app, 10.59% for a weather app, 26.91% for a location utility app, and 11.37% for Apple’s sample app. Additionally, our empirical evaluation shows that choosing an energy-hungry configuration can increase the energy consumption by up to a maximum of 23.97%.  Our guidelines are effective on 3 real-world apps, and our methodology may be used to extract energy-efficient guidelines for frameworks other than the Core Location framework.},
 accepted = {2021-06-15},
 author = {Abdul Ali Bangash and Daniil Tiganov and Karim Ali and Abram Hindle},
 authors = {Abdul Ali Bangash, Daniil Tiganov, Karim Ali, Abram Hindle},
 booktitle = {Proceedings of the 2021 International Conference on Software Maintenance and Evolution (ICSME)},
 code = {bangash2021ICSME-igreenminer},
 date = {2021-06-15},
 funding = {NSERC Discovery},
 location = {Luxembourg City, Luxembourg},
 pagerange = {1--12},
 pages = {1--12},
 rate = {24%},
 role = {Co-Author},
 title = {Energy Efficient Guidelines for iOS Core Location Framework},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/bangash2021ICSME-igreenminer.pdf},
 venue = {Proceedings of the 2021 International Conference on Software Maintenance and Evolution (ICSME)},
 year = {2021}
}

Multilabel 12-Lead Electrocardiogram Classification Using Beat To Sequence Autoencoders

Alexander William Wong, Amir Salimi, Abram Hindle, Sunil Vasu Kalmady, Padma Kaul
IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada
2021 1--4
Acceptance:
PDF

The 12-lead electrocardiogram (ECG) measures the electrical activity of the heart for physicians to use in diagnosing cardiac disorders. This paper investigates the multi-label, multi-class classification of ECG records into one or more of 27 possible medical diagnoses. Our multi-step approach uses conventional physiological algorithms for segmentation of heartbeats from the baseline signals. We stack a heartbeat autoencoder over heartbeat windows to make embeddings, then we encode this sequence of embeddings to make an ECG embedding which we then classify on. We utilize the public dataset of 43,101 available ECG records provided by the PhysioNet/CinC 2020 challenge, performing repeated random subsampling and splitting the available records into 80% training, 10% validation, and 10% test splits, 20 times. We attain a mean test split challenge score of 0.248 with an overall macro F 1 score of 0.260 across the 27 labels.

@inproceedings{wong2021ICASSP-ecg-autoencoder,
 abstract = {The 12-lead electrocardiogram (ECG) measures the electrical activity of the heart for physicians to use in diagnosing cardiac disorders. This paper investigates the multi-label, multi-class classification of ECG records into one or more of 27 possible medical diagnoses. Our multi-step approach uses conventional physiological algorithms for segmentation of heartbeats from the baseline signals. We stack a heartbeat autoencoder over heartbeat windows to make embeddings, then we encode this sequence of embeddings to make an ECG embedding which we then classify on. We utilize the public dataset of 43,101 available ECG records provided by the PhysioNet/CinC 2020 challenge, performing repeated random subsampling and splitting the available records into 80% training, 10% validation, and 10% test splits, 20 times. We attain a mean test split challenge score of 0.248 with an overall macro F 1 score of 0.260 across the 27 labels.},
 accepted = {2021-01-29},
 author = {Alexander William Wong and Amir Salimi and Abram Hindle and Sunil Vasu Kalmady and Padma Kaul},
 authors = {Alexander William Wong, Amir Salimi, Abram Hindle, Sunil Vasu Kalmady, Padma Kaul},
 booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing},
 code = {wong2021ICASSP-ecg-autoencoder},
 date = {2021-06-10},
 funding = {NSERC Discovery},
 location = {Toronto, Canada},
 pagerange = {1--4},
 pages = {1--4},
 rate = {},
 region = {Ontario},
 role = {Co-Author},
 title = {Multilabel 12-Lead Electrocardiogram Classification Using Beat To Sequence Autoencoders},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/wong2021ICASSP-ecg-autoencoder.pdf},
 venue = {IEEE International Conference on Acoustics, Speech and Signal Processing},
 year = {2021}
}

Revisiting Dockerfiles in Open Source Software Over Time

Kalvin Eng, Abram Hindle
Proceedings of the 2021 International Conference on Mining Software Repositories, Madrid, Spain
2021 1--12
Acceptance:34.3%
PDF

Docker is becoming ubiquitous with containerization for developing and deploying applications. Previous studies have analyzed Dockerfiles that are used to create container images in order to better understand how to improve Docker tooling. These studies obtain Dockerfiles using either Docker Hub or Github. In this paper, we revisit the findings of previous studies using the largest set of Dockerfiles known to date with over 9.4 million unique Dockerfiles found in the World of Code infrastructure spanning from 2013-2020. We contribute a historical view of the Dockerfile format by analyzing the Docker engine changelogs and use the history to enhance our analysis of Dockerfiles. We also reconfirm previous findings of a downward trend in using OS images and an upward trend of using language images. As well, we reconfirm that Dockerfile smell counts are slightly decreasing meaning that Dockerfile authors are likely getting better at following best practices. Based on these findings, it indicates that previous analyses from prior works have been correct in many of their findings and their suggestions to build better tools for Docker image creation are further substantiated.

@inproceedings{eng2021MSR-dockerfiles,
 abstract = {Docker is becoming ubiquitous with containerization for developing and deploying applications. Previous studies have analyzed Dockerfiles that are used to create container images in order to better understand how to improve Docker tooling. These studies obtain Dockerfiles using either Docker Hub or Github. In this paper, we revisit the findings of previous studies using the largest set of Dockerfiles known to date with over 9.4 million unique Dockerfiles found in the World of Code infrastructure spanning from 2013-2020. We contribute a historical view of the Dockerfile format by analyzing the Docker engine changelogs and use the history to enhance our analysis of Dockerfiles. We also reconfirm previous findings of a downward trend in using OS images and an upward trend of using language images. As well, we reconfirm that Dockerfile smell counts are slightly decreasing meaning that Dockerfile authors are likely getting better at following best practices. Based on these findings, it indicates that previous analyses from prior works have been correct in many of their findings and their suggestions to build better tools for Docker image creation are further substantiated.},
 accepted = {2021-02-22},
 author = {Kalvin Eng and Abram Hindle},
 authors = {Kalvin Eng, Abram Hindle},
 booktitle = {Proceedings of the 2021 International Conference on Mining Software Repositories},
 code = {eng2021MSR-dockerfiles},
 date = {2021-05-18},
 funding = {NSERC Discovery},
 location = {Madrid, Spain},
 pagerange = {1--12},
 pages = {1--12},
 rate = {34.3%},
 role = {Co-Author},
 title = {Revisiting Dockerfiles in Open Source Software Over Time},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/eng2021MSR-dockerfiles.pdf},
 venue = {Proceedings of the 2021 International Conference on Mining Software Repositories},
 year = {2021}
}

PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects

Arthur Veloso Kamienski, Luisa Palechor, Abram Hindle, Cor-Paul Bezemer
Proceedings of the 2021 International Conference on Mining Software Repositories, Madrid, Spain
2021 1--5
Acceptance:
PDF

Single-statement bugs (SStuBs) can have a severe impact on developer productivity. Despite usually being simple and not offering much of a challenge to fix, these bugs may still disturb a developer’s workflow and waste precious development time. However, few studies have paid attention to these simple bugs, focusing instead on bugs of any size and complexity. In this study, we explore the occurrence of SStuBs in some of the most popular open-source Python projects on GitHub, while also characterizing their patterns and distribution. We further compare these bugs to SStuBs found in a previous study on Java Maven projects. We find that these Python projects have different SStuB patterns than the ones in Java Maven projects and identify 7 new SStuB patterns. Our results may help uncover the importance of understanding these bugs for the Python programming language, and how developers can handle them more effectively.

@inproceedings{kamienski2021MSR-pysstubs,
 abstract = {Single-statement bugs (SStuBs) can have a severe impact on developer productivity. Despite usually being simple and not offering much of a challenge to fix, these bugs may still disturb a developer’s workflow and waste precious development time. However, few studies have paid attention to these simple bugs, focusing instead on bugs of any size and complexity. In this study, we explore the occurrence of SStuBs in some of the most popular open-source Python projects on GitHub, while also characterizing their patterns and distribution. We further compare these bugs to SStuBs found in a previous study on Java Maven projects. We find that these Python projects have different SStuB patterns than the ones in Java Maven projects and identify 7 new SStuB patterns. Our results may help uncover the importance of understanding these bugs for the Python programming language, and how developers can handle them more effectively.},
 accepted = {2021-02-22},
 author = {Arthur Veloso Kamienski and Luisa Palechor and Abram Hindle and Cor-Paul Bezemer},
 authors = {Arthur Veloso Kamienski, Luisa Palechor, Abram Hindle, Cor-Paul Bezemer},
 booktitle = {Proceedings of the 2021 International Conference on Mining Software Repositories},
 code = {kamienski2021MSR-pysstubs},
 date = {2021-05-18},
 funding = {NSERC Discovery},
 location = {Madrid, Spain},
 pagerange = {1--5},
 pages = {1--5},
 rate = {},
 role = {Co-Author},
 title = {PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/kamienski2021MSR-pysstubs.pdf},
 venue = {Proceedings of the 2021 International Conference on Mining Software Repositories},
 year = {2021}
}

What Causes Wrong Sentiment Classifications of Game Reviews?

Markos Viggiato, Dayi Lin, Abram Hindle, Cor-Paul Bezemer
IEEE Transactions on Games,
2021 350--363
PDF

Sentiment analysis is a popular technique to identify the sentiment of a piece of text. Several different domains have been targeted by sentiment analysis research, such as Twitter, movie reviews, and mobile app reviews. Although several techniques have been proposed, the performance of current sentiment analysis techniques are still far from acceptable, mainly when applied in domains on which they were not trained. In addition, the causes of wrong classifications are not clear. In this paper, we study how sentiment analysis performs on game reviews. We first report the results of a large scale empirical study on the performance of widely-used sentiment classifiers on game reviews. Then, we investigate the root causes for the wrong classifications and quantify the impact of each cause on the overall performance. We study three existing classifiers: Stanford CoreNLP, NLTK, and SentiStrength. Our results show that most classifiers do not perform well on game reviews, with the best one being NLTK (with an AUC of 0.70). We also identified four main causes for wrong classifications, such as reviews that point out advantages and disadvantages of the game, which might confuse the classifier. The identified causes are not trivial to be resolved and we call upon sentiment analysis and game researchers and developers to prioritize a research agenda that investigates how the performance of sentiment analysis of game reviews can be improved, for instance by developing techniques that can automatically deal with specific game-related issues of reviews (e.g., reviews with advantages and disadvantages). Finally, we show that training sentiment classifiers on reviews that are stratified by the game genre is effective.

@article{viggiatoTG2021-game-review-sentiment,
 abstract = {Sentiment analysis is a popular technique to identify the sentiment of a piece of text. Several different domains have been targeted by sentiment analysis research, such as Twitter, movie reviews, and mobile app reviews. Although several techniques have been proposed, the performance of current sentiment analysis techniques are still far from acceptable, mainly when applied in domains on which they were not trained. In addition, the causes of wrong classifications are not clear. In this paper, we study how sentiment analysis performs on game reviews. We first report the results of a large scale empirical study on the performance of widely-used sentiment classifiers on game reviews. Then, we investigate the root causes for the wrong classifications and quantify the impact of each cause on the overall performance. We study three existing classifiers: Stanford CoreNLP, NLTK, and SentiStrength. Our results show that most classifiers do not perform well on game reviews, with the best one being NLTK (with an AUC of 0.70). We also identified four main causes for wrong classifications, such as reviews that point out advantages and disadvantages of the game, which might confuse the classifier. The identified causes are not trivial to be resolved and we call upon sentiment analysis and game researchers and developers to prioritize a research agenda that investigates how the performance of sentiment analysis of game reviews can be improved, for instance by developing techniques that can automatically deal with specific game-related issues of reviews (e.g., reviews with advantages and disadvantages). Finally, we show that training sentiment classifiers on reviews that are stratified by the game genre is effective.},
 accepted = {2021-04-05},
 author = {Markos Viggiato and Dayi Lin and Abram Hindle and Cor-Paul Bezemer},
 authors = {Markos Viggiato, Dayi Lin, Abram Hindle, Cor-Paul Bezemer},
 code = {viggiatoTG2021-game-review-sentiment},
 day = {05},
 funding = {NSERC Discovery},
 institution = {University of Alberta},
 journal = {IEEE Transactions on Games},
 month = {April},
 pagerange = {350--363},
 pages = {350--363},
 role = { Researcher / Co-author},
 title = {What Causes Wrong Sentiment Classifications of Game Reviews?},
 type = {article},
 url = {http://softwareprocess.ca/pubs/viggiatoTG2021-game-review-sentiment.pdf},
 venue = {IEEE Transactions on Games},
 year = {2021}
}

How are issue reports discussed in Gitter chat rooms?

Hareem Sahar, Abram Hindle, Cor-Paul Bezemer
Journal of Systems and Software,
2020 1--53
PDF

Informal communication channels like mailing lists, IRC and instant messaging play a vital role in open source software development by facilitating communication within geographically diverse project teams e.g., to discuss issue reports to facilitate the bug-fixing process. More recently, chat systems like Slack and Gitter have gained a lot of popularity and developers are rapidly adopting them. Gitter is a chat system that is specifically designed to address the needs of GitHub users. Gitter hosts project-based asynchronous chats which foster frequent project discussions among participants. Developer discussions contain a wealth of information such as the rationale behind decisions made during the evolution of a project. In this study, we explore 24 open source project chat rooms that are hosted on Gitter, containing a total of 3,407,622 messages and 16,665 issue references. We manually analyze the contents of chat room discussions around 476 issue reports. The results of our study show the prevalence of issue discussions on Gitter, and that the discussed issue reports have a longer resolution time than the issue reports that are never brought on Gitter.

@article{sahar2020JSS-Gitter-Issues,
 abstract = {Informal communication channels like mailing lists, IRC and instant messaging play a vital role in open source software development by facilitating communication within geographically diverse project teams e.g., to discuss issue reports to facilitate the bug-fixing process. More recently, chat systems like Slack and Gitter have gained a lot of popularity and developers are rapidly adopting them.  Gitter is a chat system that is specifically designed to address the needs of GitHub users. Gitter hosts project-based asynchronous chats which foster frequent project discussions among participants. Developer discussions contain a wealth of information such as the rationale behind decisions made during the evolution of a project. In this study, we explore 24 open source project chat rooms that are hosted on Gitter, containing a total of 3,407,622 messages and 16,665 issue references. We manually analyze the contents of chat room discussions around 476 issue reports. The results of our study show the prevalence of issue discussions on Gitter, and that the discussed issue reports have a longer resolution time than the issue reports that are never brought on Gitter.},
 accepted = {2020-10-29},
 author = {Hareem Sahar and Abram Hindle and Cor-Paul Bezemer},
 authors = {Hareem Sahar, Abram Hindle, Cor-Paul Bezemer},
 code = {sahar2020JSS-Gitter-Issues},
 day = {29},
 funding = {NSERC Discovery},
 institution = {University of Alberta},
 journal = {Journal of Systems and Software},
 month = {October},
 pages = {1--53},
 role = { Researcher / Co-author},
 title = {How are issue reports discussed in Gitter chat rooms?},
 type = {article},
 url = {http://softwareprocess.ca/pubs/sahar2020JSS-Gitter-Issues.pdf},
 venue = {Journal of Systems and Software},
 year = {2020}
}

Make your own audience: virtual listeners can filter generated drum programs

Amir Salimi and Abram Hindle
Proceedings of the 2020 AI Music Creativity Conference, 2020, Stockholm, Sweden
2020 1--8
Acceptance:
PDF

Can we generate drum synthesizers automatically? We present an approach for the automatic generation of synthesizer programs for one-shot percussive sounds. Recent advancements in digital synthesis, heuristic search, and neural networks can be utilized for sound generation. Yet the need for data, the problem of open set recognition, and high computational costs persist as barriers towards the expansion of sound libraries using these techniques. We generate quick, scalable, percussion synthesizers using classical signal processing. We train drum classifiers to find and classify synthesizer programs that mimic percussive sounds. We use features from Fourier transformations and autoencoder embeddings to train machine learning classifiers. Manual listening tests of the generated sounds demonstrates the system can successfully generate drum synthesizers and categorize drum sounds. To facilitate future research, we share our curated dataset of free percussive sounds.

@inproceedings{salimiCSMC2020-virtual-listeners-drums,
 abstract = {Can we generate drum synthesizers automatically? We present an approach for the automatic generation of synthesizer programs for one-shot percussive sounds. Recent advancements in digital synthesis, heuristic search, and neural networks can be utilized for sound generation. Yet the need for data, the problem of open set recognition, and high computational costs persist as barriers towards the expansion of sound libraries using these techniques. We generate quick, scalable, percussion synthesizers using classical signal processing. We train drum classifiers to find and classify synthesizer programs that mimic percussive sounds. We use features from Fourier transformations and autoencoder embeddings to train machine learning classifiers. Manual listening tests of the generated sounds demonstrates the system can successfully generate drum synthesizers and categorize drum sounds. To facilitate future research, we share our curated dataset of free percussive sounds.},
 accepted = {2020-09-17},
 author = {Amir Salimi and Abram Hindle},
 authors = {Amir Salimi and Abram Hindle},
 booktitle = {Proceedings of the 2020 AI Music Creativity Conference, 2020},
 code = {salimiCSMC2020-virtual-listeners-drums},
 date = {2020-10-21},
 funding = {NSERC Discovery},
 isbn = {978-91-519-5560-5},
 location = {Stockholm, Sweden},
 pagerange = {1--8},
 pages = {1--8},
 rate = {},
 role = {Co-Author},
 title = {Make your own audience: virtual listeners can filter generated drum programs},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/salimiCSMC2020-virtual-listeners-drums.pdf},
 venue = {Proceedings of the 2020 AI Music Creativity Conference, 2020},
 year = {2020}
}

Multilabel 12-Lead Electrocardiogram Classification Using Gradient Boosting Tree Ensemble

Alexander W Wong, Weijie Sun, Sunil V Kalmady, Padma Kaul, Abram Hindle
2020 Computing in Cardiology (CinC) PhysioNet Challenge, Rimini, Italy
2020 1--4
Acceptance:41/300 or 13%
PDF

The 12-lead electrocardiogram (ECG) is a commonly used tool for detecting cardiac abnormalities such as atrial fibrillation, blocks, and irregular complexes. For the Phy- sioNet/CinC 2020 Challenge, we built an algorithm using gradient boosted tree ensembles fitted on morphology and signal processing features to classify ECG diagnosis. For each lead, we derive features from heart rate vari- ability, PQRST template shape, and the full signal wave- form. We join the features of all 12 leads to fit an ensem- ble of gradient boosting decision trees to predict probabil- ities of ECG instances belonging to each class. We train a phase one set of feature importance determining models to isolate the top 1,000 most important features to use in our phase two diagnosis prediction models. We use re- peated random sub-sampling by splitting our dataset of 43,101 records into 100 independent runs of 85:15 train- ing/validation splits for our internal evaluation results. Our methodology generates us an official phase valida- tion set score of 0.476 and test set score of -0.080 under the team name, CVC, placing us 36 out of 41 in the rankings.

@inproceedings{wong2020CINC-multilabel-ECG,
 abstract = {The 12-lead electrocardiogram (ECG) is a commonly used tool for detecting cardiac abnormalities such as atrial fibrillation, blocks, and irregular complexes. For the Phy- sioNet/CinC 2020 Challenge, we built an algorithm using gradient boosted tree ensembles fitted on morphology and signal processing features to classify ECG diagnosis.  For each lead, we derive features from heart rate vari- ability, PQRST template shape, and the full signal wave- form. We join the features of all 12 leads to fit an ensem- ble of gradient boosting decision trees to predict probabil- ities of ECG instances belonging to each class. We train a phase one set of feature importance determining models to isolate the top 1,000 most important features to use in our phase two diagnosis prediction models. We use re- peated random sub-sampling by splitting our dataset of 43,101 records into 100 independent runs of 85:15 train- ing/validation splits for our internal evaluation results.  Our methodology generates us an official phase valida- tion set score of 0.476 and test set score of -0.080 under the team name, CVC, placing us 36 out of 41 in the rankings.},
 accepted = {2020-10-01},
 author = {Alexander W Wong and Weijie Sun and Sunil V Kalmady and Padma Kaul and Abram Hindle},
 authors = {Alexander W Wong, Weijie Sun, Sunil V Kalmady, Padma Kaul, Abram Hindle},
 booktitle = {2020 Computing in Cardiology (CinC) PhysioNet Challenge},
 code = {wong2020CINC-multilabel-ECG},
 date = {2020-09-21},
 funding = {NSERC Discovery},
 location = {Rimini, Italy},
 pagerange = {1--4},
 pages = {1--4},
 rate = {41/300 or 13%},
 role = {Co-Author},
 title = {Multilabel 12-Lead Electrocardiogram Classification Using Gradient Boosting Tree Ensemble},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/wong2020CINC-multilabel-ECG.pdf},
 venue = {2020 Computing in Cardiology (CinC) PhysioNet Challenge},
 year = {2020}
}

On the Time-Based Conclusion Stability of Cross-Project Defect Prediction Models

Abdul Bangash, Hareem Sahar, Abram Hindle, Karim Ali
Empirical Software Engineering,
2020 1--39
PDF

Researchers in empirical software engineering often make claims based on observable data such as defect reports. Unfortunately, in many cases, these claims are generalized beyond the data sets that have been evaluated. Will the researcher's conclusions hold a year from now for the same software projects? Perhaps not. Recent studies show that in the area of Software Analytics, conclusions over different data sets are usually inconsistent. In this article, we empirically investigate whether conclusions in the area of cross-project defect prediction truly exhibit stability throughout time or not. Our investigation applies a time-aware evaluation approach where models are trained only on the past, and evaluations are executed only on the future. Through this time-aware evaluation, we show that depending on which time period we evaluate defect predictors, their performance, in terms of F-Score, the area under the curve (AUC), and Mathews Correlation Coefficient (MCC), varies and their results are not consistent. The next release of a product, which is significantly different from its prior release, may drastically change defect prediction performance. Therefore, without knowing about the conclusion stability, empirical software engineering researchers should limit their claims of performance within the contexts of evaluation, because broad claims about defect prediction performance might be contradicted by the next upcoming release of a product under analysis

@article{bangash2020EMSEstability,
 abstract = {Researchers in empirical software engineering often make claims based on observable data such as defect reports. Unfortunately, in many cases, these claims are generalized beyond the data sets that have been evaluated. Will the researcher's conclusions hold a year from now for the same software projects? Perhaps not. Recent studies show that in the area of Software Analytics, conclusions over different data sets are usually inconsistent. In this article, we empirically investigate whether conclusions in the area of cross-project defect prediction truly exhibit stability throughout time or not. Our investigation applies a time-aware evaluation approach where models are trained only on the past, and evaluations are executed only on the future. Through this time-aware evaluation, we show that depending on which time period we evaluate defect predictors, their performance, in terms of F-Score, the area under the curve (AUC), and Mathews Correlation Coefficient (MCC), varies and their results are not consistent. The next release of a product, which is significantly different from its prior release, may drastically change defect prediction performance. Therefore, without knowing about the conclusion stability, empirical software engineering researchers should limit their claims of performance within the contexts of evaluation, because broad claims about defect prediction performance might be contradicted by the next upcoming release of a product under analysis},
 accepted = {2020-08-07},
 author = {Abdul Bangash and Hareem Sahar and Abram Hindle and Karim Ali},
 authors = {Abdul Bangash, Hareem Sahar, Abram Hindle, Karim Ali},
 code = {bangash2020EMSEstability},
 day = {7},
 funding = {NSERC Discovery},
 institution = {University of Alberta},
 journal = {Empirical Software Engineering},
 month = {August},
 pages = {1--39},
 preprint = {https://arxiv.org/abs/1911.06348},
 role = { Researcher / Co-author},
 title = {On the Time-Based Conclusion Stability of Cross-Project Defect Prediction Models},
 type = {article},
 url = {http://softwareprocess.ca/pubs/bangash2020EMSEstability.pdf},
 venue = {Empirical Software Engineering},
 year = {2020}
}

Understanding DevOps Education with Grounded Theory

Candy Pang, Abram Hindle, Denilson Barbosa
2020 IEEE International Conference on Software Engineering, Software Engineering Education and Training Track, Seoul, South Korea
2020 1--12
Acceptance:21/84 or 25%
PDF

DevOps stands for Development-Operations. It arises from the IT industry as a movement aligning development and operations teams. DevOps is broadly recognized as an IT standard, and there is high demand for DevOps practitioners in industry. Since ACM & IEEE suggest that undergraduate computer science curricula "must adequately prepare [students] for the workforce", we studied whether undergraduates acquired adequate DevOps skills to fulfill the demand for DevOps practitioners in industry. We employed Grounded Theory (GT), a social science qualitative research methodology, to study DevOps education from academic and industrial perspectives. In academia, academics were not motivated to learn or adopt DevOps, and we did not find strong evidence of academics teaching DevOps. Academics need incentives to adopt DevOps, in order to stimulate interest in teaching DevOps. In industry, DevOps practitioners lack clearly defined roles and responsibilities, for the DevOps topic is diverse and growing too fast. Therefore, practitioners can only learn DevOps through hands-on working experience. As a result, academic institutions should provide fundamental DevOps education (in culture, procedure, and technology) to prepare students for their future DevOps advancement in industry. Based on our findings, we proposed five groups of future studies to advance DevOps education in academia.

@inproceedings{pang2020ICSESeet-Devops,
 abstract = {DevOps stands for Development-Operations. It arises from the IT industry as a movement aligning development and operations teams. DevOps is broadly recognized as an IT standard, and there is high demand for DevOps practitioners in industry. Since ACM & IEEE suggest that undergraduate computer science curricula "must adequately prepare [students] for the workforce", we studied whether undergraduates acquired adequate DevOps skills to fulfill the demand for DevOps practitioners in industry. We employed Grounded Theory (GT), a social science qualitative research methodology, to study DevOps education from academic and industrial perspectives. In academia, academics were not motivated to learn or adopt DevOps, and we did not find strong evidence of academics teaching DevOps. Academics need incentives to adopt DevOps, in order to stimulate interest in teaching DevOps. In industry, DevOps practitioners lack clearly defined roles and responsibilities, for the DevOps topic is diverse and growing too fast. Therefore, practitioners can only learn DevOps through hands-on working experience. As a result, academic institutions should provide fundamental DevOps education (in culture, procedure, and technology) to prepare students for their future DevOps advancement in industry. Based on our findings, we proposed five groups of future studies to advance DevOps education in academia.},
 accepted = {2020-01-15},
 author = {Candy Pang and Abram Hindle and Denilson Barbosa},
 authors = {Candy Pang, Abram Hindle, Denilson Barbosa},
 booktitle = {2020 IEEE International Conference on Software Engineering, Software Engineering Education and Training Track},
 code = {pang2020ICSESeet-Devops},
 date = {2020-07-07},
 funding = {NSERC Discovery},
 location = {Seoul, South Korea},
 pagerange = {1--12},
 pages = {1--12},
 rate = {21/84 or 25%},
 role = {Co-Author},
 title = {Understanding DevOps Education with Grounded Theory},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/pang2020ICSESeet-Devops.pdf},
 venue = {2020 IEEE International Conference on Software Engineering, Software Engineering Education and Training Track},
 year = {2020}
}

Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes

Alexander William Wong, Amir Salimi, Shaiful Alam Chowdhury, Abram Hindle
2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), Cleveland, United States
2019 318--322
Acceptance:26/46 or 56%
PDF

One problem when studying how to find and fix syntax errors is how to get natural and representative examples of syntax errors. Most syntax error datasets are not free, open, and public, or they are extracted from novice programmers and do not represent syntax errors that the general population of developers would make. Programmers of all skill levels post questions and answers to Stack Overflow which may contain snippets of source code along with corresponding text and tags. Many snippets do not parse, thus they are ripe for forming a corpus of syntax errors and corrections. Our primary contribu- tion is an approach for extracting natural syntax errors and their corresponding human made fixes to help syntax error research. A Python abstract syntax tree parser is used to determine preliminary errors and corrections on code blocks extracted from the SOTorrent data set. We further analyzed our code by executing the corrections in a Python interpreter. We applied our methodology to produce a public data set of 62,965 Python Stack Overflow code snippets with corresponding tags, errors, and stack traces. We found that errors made by Stack Overflow users do not match errors made by student developers or random mutations, implying there is a serious representativeness risk within the field. Finally we share our dataset openly so that future researchers can re-use and extend our syntax errors and fixes.

@inproceedings{wongICSME2019-syntax,
 abstract = {One problem when studying how to find and fix syntax errors is how to get natural and representative examples of syntax errors. Most syntax error datasets are not free, open, and public, or they are extracted from novice programmers and do not represent syntax errors that the general population of developers would make. Programmers of all skill levels post questions and answers to Stack Overflow which may contain snippets of source code along with corresponding text and tags.  Many snippets do not parse, thus they are ripe for forming a corpus of syntax errors and corrections. Our primary contribu- tion is an approach for extracting natural syntax errors and their corresponding human made fixes to help syntax error research.  A Python abstract syntax tree parser is used to determine preliminary errors and corrections on code blocks extracted from the SOTorrent data set. We further analyzed our code by executing the corrections in a Python interpreter. We applied our methodology to produce a public data set of 62,965 Python Stack Overflow code snippets with corresponding tags, errors, and stack traces. We found that errors made by Stack Overflow users do not match errors made by student developers or random mutations, implying there is a serious representativeness risk within the field.  Finally we share our dataset openly so that future researchers can re-use and extend our syntax errors and fixes.},
 accepted = {2019-07-13},
 author = {Alexander William Wong and Amir Salimi and Shaiful Alam Chowdhury and Abram Hindle},
 authors = {Alexander William Wong, Amir Salimi, Shaiful Alam Chowdhury, Abram Hindle},
 booktitle = {2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
 code = {wongICSME2019-syntax},
 date = {2019-07-13},
 funding = {NSERC Discovery},
 location = {Cleveland, United States},
 pagerange = {318--322},
 pages = {318--322},
 rate = {26/46 or 56%},
 region = {Ohio},
 role = {Co-Author},
 title = {Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/wongICSME2019-syntax.pdf},
 venue = {2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)},
 year = {2019}
}

GreenBundle: An Empirical Study on the Energy Impact of Bundled Processing

Shaiful Alam Chowdhury, Abram Hindle, Rick Kazman, Takumi Shuto, Ken Matsui, Yasutaka Kamei
Proceedings of the 41st{ACM/IEEE} International Conference on Software Engineering (ICSE), Montreal, Canada
2019 1--12
Acceptance:109/529 or 21%
PDF

Energy consumption is a concern in the data-center and at the edge, on mobile devices such as smartphones. Software that consumes too much energy threatens the utility of the end-user's mobile device. Energy consumption is fundamentally a systemic kind of performance and hence it should be addressed at design time via a software architecture that supports it, rather than after release, via some form of refactoring. Unfortunately developers often lack knowledge of what kinds of designs and architectures can help address software energy consumption. In this paper we show that some simple design choices can have significant effects on energy consumption. In particular we examine the Model-View-Controller architectural pattern and demonstrate how converting to Model-View-Presenter with bundling can improve the energy performance of both benchmark systems and real world applications. We show the relationship between energy consumption and bundled and delayed view updates: bundling events in the presenter can often reduce energy consumption by 30%.

@inproceedings{chowdhury2019ICSE-greenbundle,
 abstract = {Energy consumption is a concern in the data-center and at the edge, on mobile devices such as smartphones.  Software that consumes too much energy threatens the utility of the end-user's mobile device.  Energy consumption is fundamentally a systemic kind of performance and hence it should be addressed at design time via a software architecture that supports it, rather than after release, via some form of refactoring.  Unfortunately developers often lack knowledge of what kinds of designs and architectures can help address software energy consumption. In this paper we show that some simple design choices can have significant effects on energy consumption. In particular we examine the Model-View-Controller architectural pattern and demonstrate how converting to Model-View-Presenter with bundling can improve the energy performance of both benchmark systems and real world applications. We show the relationship between energy consumption and bundled and delayed view updates: bundling events in the presenter can often reduce energy consumption by 30%.},
 accepted = {2018-12-01},
 author = {Shaiful Alam Chowdhury and Abram Hindle and Rick Kazman and Takumi Shuto and Ken Matsui and Yasutaka Kamei},
 authors = {Shaiful Alam Chowdhury, Abram Hindle, Rick Kazman, Takumi Shuto, Ken Matsui, Yasutaka Kamei},
 booktitle = {Proceedings of the 41st{ACM/IEEE} International Conference on Software Engineering (ICSE)},
 code = {chowdhury2019ICSE-greenbundle},
 date = {2019-05-30},
 funding = {NSERC Discovery, JSPS},
 location = {Montreal, Canada},
 pagerange = {1--12},
 pages = {1--12},
 rate = {109/529 or 21%},
 region = {Quebec},
 role = { Author},
 title = {GreenBundle: An Empirical Study on the Energy Impact of Bundled Processing},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/chowdhury2019ICSE-greenbundle.pdf},
 venue = {Proceedings of the 41st{ACM/IEEE} International Conference on Software Engineering (ICSE)},
 year = {2019}
}

What do developers know about machine learning: a study of ML discussions on StackOverflow

Abdul Ali Bangash, Hareem Sahar, Shaiful Chowdhury, Alexander William Wong, Abram Hindle, Karim Ali
Proceedings of the 6th International Conference on Mining Software Repositories (MSR19), Montreal, Canada
2019 1--5
Acceptance:14/27 or 52%
PDF

Machine learning is a branch of Artificial Intelligence that lets computers learn from experience instead of being explicitly programmed to do everything. It is growing in popularity over time and is successfully being used for some of the Software Engineering tasks today e.g. bug prediction and software development effort estimation. In order to gain deeper insights into the uses of machine learning in software engineering context, we conduct a study on SOTorrent dataset that contains Stackoverflow posts from 2008 to 2018. We studied almost 28000 machine learning posts spanning a ten year interval and identified the problems of software engineering addressed by machine learning. Our analyses on the metadata of posts show that ample support for classical machine learning problems is available on Stackoverflow. However, state-of-the-art machine learning algorithms and technologies currently lack support, probably because of their less prevalence in the software engineering community as of now. We believe that the insights provided by our study will be useful for software engineers, educators and practitioners alike.

@inproceedings{bangash2019MSRChallenge-ML,
 abstract = {Machine learning is a branch of Artificial Intelligence that lets computers learn from experience instead of being explicitly programmed to do everything. It is growing in popularity over time and is successfully being used for some of the Software Engineering tasks today e.g. bug prediction and software development effort estimation. In order to gain deeper insights into the uses of machine learning in software engineering context, we conduct a study on SOTorrent dataset that contains Stackoverflow posts from 2008 to 2018. We studied almost 28000 machine learning posts spanning a ten year interval and identified the problems of software engineering addressed by machine learning. Our analyses on the metadata of posts show that ample support for classical machine learning problems is available on Stackoverflow. However, state-of-the-art machine learning algorithms and technologies currently lack support, probably because of their less prevalence in the software engineering community as of now. We believe that the insights provided by our study will be useful for software engineers, educators and practitioners alike.},
 accepted = {2019-03-01},
 author = {Abdul Ali Bangash and Hareem Sahar and Shaiful Chowdhury and Alexander William Wong and Abram Hindle and Karim Ali},
 authors = {Abdul Ali Bangash, Hareem Sahar, Shaiful Chowdhury, Alexander William Wong, Abram Hindle, Karim Ali},
 booktitle = {Proceedings of the 6th International Conference on Mining Software Repositories (MSR19)},
 code = {bangash2019MSRChallenge-ML},
 date = {2019-05-26},
 funding = {NSERC Discovery},
 location = {Montreal, Canada},
 pagerange = {1--5},
 pages = {1--5},
 rate = {14/27 or 52%},
 region = {Quebec},
 role = {Co-Author},
 title = {What do developers know about machine learning: a study of ML discussions on StackOverflow},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/bangash2019MSRChallenge-ML.pdf},
 venue = {Proceedings of the 6th International Conference on Mining Software Repositories (MSR19)},
 year = {2019}
}

Complexity: Let's Not Make This Complicated

Abram Hindle
IEEE Software,
2019 130--132
PDF
Publisher Link
DOI:10.1109/MS.2018.2883875

Invited, not peer reviewed.

Complexity != Complicated

@article{hindle2019Software-Complexity,
 abstract = {Complexity != Complicated},
 author = {Abram Hindle},
 authors = {Abram Hindle},
 code = {hindle2019Software-Complexity},
 date = {2019-02-27},
 doi = {10.1109/MS.2018.2883875},
 funding = {NSERC Discovery},
 journal = {IEEE Software},
 notes = {Invited, not peer reviewed.},
 pagerange = {130--132},
 pages = {130--132},
 payurl = {https://doi.org/10.1109/MS.2018.2883875},
 role = {Invited Opinion},
 title = {Complexity: Let's Not Make This Complicated},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2019Software-Complexity.pdf},
 venue = {IEEE Software},
 volume = {36(2)},
 year = {2019}
}

Automatic topic classification of test cases using text mining at an Android smartphone vendor

Junji Shimagaki, Yasutaka Kamei, Naoyasu Ubayashi, Abram Hindle
Proceedings of the 12th {ACM/IEEE} International Symposium on Empirical Software Engineering and Measurement (ESEM), Olulu, Finland
2018 1--10
Acceptance:12/28 or 43%
PDF

ESEM Best Industrial Paper Award

Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups.\nAim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling.\nMethod: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company.\nResults: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label.\nConclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.

@inproceedings{junji2018EMSE-topics,
 abstract = {Background: An Android smartphone is an ecosystem of applications, drivers, operating system components, and assets. The volume of the software is large and the number of test cases needed to cover the functionality of an Android system is substantial. Enormous effort has been already taken to properly quantify "what features and apps were tested and verified?". This insight is provided by dashboards that summarize test coverage and results per feature. One method to achieve this is to manually tag or label test cases with the topic or function they cover, much like function points. At the studied Android smartphone vendor, tests are labelled with manually defined tags, so-called "feature labels (FLs)", and the FLs serve to categorize 100s to 1000s test cases into 10 to 50 groups.\nAim: Unfortunately for developers, manual assignment of FLs to 1000s of test cases is a time consuming task, leading to inaccurately labeled test cases, which will render the dashboard useless. We created an automated system that suggests tags/labels to the developers for their test cases rather than manual labeling.\nMethod: We use machine learning models to predict and label the functionality tested by 10,000 test cases developed at the company.\nResults: Through the quantitative experiments, our models achieved acceptable F-1 performance of 0.3 to 0.88. Also through the qualitative studies with expert teams, we showed that the hierarchy and path of tests was a good predictor of a feature's label.\nConclusions: We find that this method can reduce tedious manual effort that software developers spent classifying test cases, while providing more accurate classification results.},
 accepted = {2018-08-15},
 author = {Junji Shimagaki and Yasutaka Kamei and Naoyasu Ubayashi and Abram Hindle},
 authors = {Junji Shimagaki, Yasutaka Kamei, Naoyasu Ubayashi, Abram Hindle},
 booktitle = {Proceedings of the 12th {ACM/IEEE} International Symposium on Empirical Software Engineering and Measurement (ESEM)},
 code = {junji2018EMSE-topics},
 date = {2018-10-11},
 funding = {NSERC Discovery, JSPS},
 location = {Olulu, Finland},
 notes = {ESEM Best Industrial Paper Award},
 pagerange = {1--10},
 pages = {1--10},
 rate = {12/28 or 43%},
 role = { Author},
 title = {Automatic topic classification of test cases using text mining at an Android smartphone vendor},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/junji2018EMSE-topics.pdf},
 venue = {Proceedings of the 12th {ACM/IEEE} International Symposium on Empirical Software Engineering and Measurement (ESEM)},
 year = {2018}
}

If you bill it, they will pay: Energy consumption in the cloud will be irrelevant until directly billed for

Abram Hindle
Proceedings of the 7th International Workshop on Requirements Engineering for Sustainable Systems (RE4SuSy),
2018 1--2
PDF

Don't leave the lights on! One reason we take energy consumption seriously is because we are directly billed for it. If one leaves the heat on high over night the effect is noticeable on the next bill. Yet as granular as cloud computing billing can be in terms of resources and quality of service, we have limited motivation to investigate energy consumption in the cloud because cloud customers cannot necessarily realize savings. Proxies for quality of service such as lower performance CPU allocations can be used but at no point does a user see a bill listing energy consumption. Furthermore the difficulty in billing energy consumption of virtualized services is non-trivial and indirect. When many VMs share the same host, attribution of energy consumption becomes difficult. When many hosts are in the same datacenter attribution of cooling costs become difficult as well. Thus due to the direct and indirect costs of running a cloud, and the sharing of resourcing pricing cloud energy consumption is difficult and typically not done. We argue that until energy consumption of hosted computers, VMs, and cloud services is pushed down from the cloud provider to the cloud consumer, datacenters will continue to consume massive amounts of energy to provide software services. When cloud end-users have to pay for energy consumption they will consider optimizing energy consumption. Once energy consumption in the cloud is a bill line item, energy consumption will become a first class performance non-functional requirement of software.

@inproceedings{hindle2018SUSY4RE,
 abstract = {  Don't leave the lights on! One reason we take energy consumption seriously is because we are directly billed for it.  If one leaves the heat on high over night the effect is noticeable on the next bill.  Yet as granular as cloud computing billing can be in terms of resources and quality of service, we have limited motivation to investigate energy consumption in the cloud because cloud customers cannot necessarily realize savings.  Proxies for quality of service such as lower performance CPU allocations can be used but at no point does a user see a bill listing energy consumption.  Furthermore the difficulty in billing energy consumption of virtualized services is non-trivial and indirect.  When many VMs share the same host, attribution of energy consumption becomes difficult.  When many hosts are in the same datacenter attribution of cooling costs become difficult as well.  Thus due to the direct and indirect costs of running a cloud, and the sharing of resourcing pricing cloud energy consumption is difficult and typically not done.  We argue that until energy consumption of hosted computers, VMs, and cloud services is pushed down from the cloud provider to the cloud consumer, datacenters will continue to consume massive amounts of energy to provide software services.  When cloud end-users have to pay for energy consumption they will consider optimizing energy consumption.  Once energy consumption in the cloud is a bill line item, energy consumption will become a first class performance non-functional requirement of software.  },
 accepted = {2018-07-07},
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {Proceedings of the 7th International Workshop on Requirements Engineering for Sustainable Systems (RE4SuSy)},
 code = {hindle2018SUSY4RE},
 date = {2018-08-20},
 funding = {NSERC Discovery},
 pagerange = {1--2},
 pages = {1--2},
 role = {author},
 title = {If you bill it, they will pay: Energy consumption in the cloud will be irrelevant until directly billed for},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2018SUSY4RE.pdf},
 venue = {Proceedings of the 7th International Workshop on Requirements Engineering for Sustainable Systems (RE4SuSy)},
 year = {2018}
}

Preventing Duplicate Bug Reports by Continuously Querying Bug Reports

Abram Hindle, Curtis Onuckzo
Empirical Software Engineering,
2018 1--38
PDF

Bug deduplication or duplicate bug report detection is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuously querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuously querying bug reports allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures can address this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.

@article{hindle2018EMSE-Continuously-Querying,
 abstract = {Bug deduplication or duplicate bug report detection is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuously querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuously querying bug reports allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures can address this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.},
 accepted = {2018-07-20},
 author = {Abram Hindle and Curtis Onuckzo},
 authors = {Abram Hindle, Curtis Onuckzo},
 code = {hindle2018EMSE-Continuously-Querying},
 day = {20},
 funding = {MITACS Accelerate and NSERC Discovery},
 journal = {Empirical Software Engineering},
 journalid = {EMSE-D-17-00233R2},
 month = {July},
 pagerange = {1--38},
 pages = {1--38},
 published = {2018-07-20},
 role = { Researcher / Co-author},
 title = {Preventing Duplicate Bug Reports by Continuously Querying Bug Reports},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2018EMSE-Continuously-Querying.pdf},
 venue = {Empirical Software Engineering},
 year = {2018}
}

How does Docker affect energy consumption? Evaluating workloads in and out of Docker containers

Eddie Antonio Santos, Carson McLean, Christopher Solinas, Abram Hindle
Journal of Software Systems,
2018 1--14
PDF

Context: Virtual machines provide isolation of services at the cost of hypervisors and more resource usage. This spurred the growth of systems like Docker that enable single hosts to isolate several applications, similar to VMs, within a low-overhead abstraction called containers. Motivation: Although containers tout low overhead performance, how much do they increase energy use? Methodology: This work statistically compares the energy consumption of three application workloads in Docker and on bare-metal Linux. Results: In all cases, there was a statistically significant (t-test and Wilcoxon p < 0.05) increase in energy consumption when running tests in Docker, mostly due to the performance of I/O system calls. Developers worried about I/O overhead could consider baremetal deployments over Docker container deployments.

@article{santos2018JSS-Docker-Energy,
 abstract = {Context: Virtual machines provide isolation of services at the cost of hypervisors and more resource usage.  This spurred the growth of systems like Docker that enable single hosts to isolate several applications, similar to VMs, within a low-overhead abstraction called containers.
 Motivation: Although containers tout low overhead performance, how much do they increase energy use?
 Methodology: This work statistically compares the energy consumption of three application workloads in Docker and on bare-metal Linux.
 Results: In all cases, there was a statistically significant (t-test and Wilcoxon p < 0.05) increase in energy consumption when running tests in Docker, mostly due to the performance of I/O system calls. Developers worried about I/O overhead could consider baremetal deployments over Docker container deployments.},
 accepted = {2018-07-13},
 author = {Eddie Antonio Santos and Carson McLean and Christopher Solinas and Abram Hindle},
 authors = {Eddie Antonio Santos, Carson McLean, Christopher Solinas, Abram Hindle},
 code = {santos2018JSS-Docker-Energy},
 day = {31},
 funding = {NSERC Discovery},
 journal = {Journal of Software Systems},
 journalid = {JSS-D-17-00355R2},
 month = {May},
 pagerange = {1--14},
 pages = {1--14},
 published = {2018-07-13},
 role = { Researcher / Co-author},
 title = {How does Docker affect energy consumption? Evaluating workloads in and out of Docker containers},
 type = {article},
 url = {http://softwareprocess.ca/pubs/santos2018JSS-Docker-Energy.pdf},
 venue = {Journal of Software Systems},
 year = {2018}
}

An exploratory study on assessing the energy impact of logging on Android applications

Shaiful Alam Chowdhury, Silvia Di Nardo, Abram Hindle, Zhen Ming (Jack) Jiang
Empirical Software Engineering,
2018 1422--1456
PDF
Publisher Link
DOI:10.1007/s10664-017-9545-x

Execution logs are debug statements that developers insert into their code. Execution logs are used widely to monitor and diagnose the health of software applications. However, logging comes with costs, as it uses computing resources and can have an impact on an application’s performance. Compared with desktop applications, one additional critical computing resource for mobile applications is battery power. Mobile application developers want to deploy energy efficient applications to end users while still maintaining the ability to monitor. Unfortunately, there is no previous work that study the energy impact of logging within mobile applications. This exploratory study investigates the energy cost of logging in Android applications using GreenMiner, an automated energy test-bed for mobile applications. Around 1000 versions from 24 Android applications (e.g., Calculator, FeedEx, Firefox, and VLC) were tested with logging enabled and disabled. To further investigate the energy impacting factors for logging, controlled experiments on a synthetic application were performed. Each test was conducted multiple times to ensure rigorous measurement. Our study found that although there is little to no energy impact when logging is enabled for most versions of the studied applications, about 79% (19/24) of the studied applications have at least one version that exhibit medium to large effect sizes in energy consumption when enabling and disabling logging. To further assess the energy impact of logging, we have conducted a controlled experiment with a synthetic application. We found that the rate of logging and the number of disk flushes are significant factors of energy consumption attributable to logging. Finally, we have examined the relation between the generated OS level execution logs and mobile energy consumption. In addition to the common cross-application log events relevant to garbage collection and graphics systems, some mobile applications also have workload-specific log events that are highly correlated with energy consumption. The regression models built with common log events show mixed performance. Mobile application developers do not need to worry about conservative logging (e.g., logs generated at rates of ≤ 1 message per second), as they are not likely to impact energy consumption. Logging has a negligible effect on energy consumption for most of the mobile applications tested. Although logs have been used effectively to diagnose and debug functional problems, it is still an open problem on how to leverage software instrumentation to debug energy problems.

@article{chowdhury2017EMSE-Logging-and-Energy,
 abstract = {Execution logs are debug statements that developers insert into their code. Execution logs are used widely to monitor and diagnose the health of software applications. However, logging comes with costs, as it uses computing resources and can have an impact on an application’s performance. Compared with desktop applications, one additional critical computing resource for mobile applications is battery power. Mobile application developers want to deploy energy efficient applications to end users while still maintaining the ability to monitor. Unfortunately, there is no previous work that study the energy impact of logging within mobile applications. This exploratory study investigates the energy cost of logging in Android applications using GreenMiner, an automated energy test-bed for mobile applications. Around 1000 versions from 24 Android applications (e.g., Calculator, FeedEx, Firefox, and VLC) were tested with logging enabled and disabled. To further investigate the energy impacting factors for logging, controlled experiments on a synthetic application were performed. Each test was conducted multiple times to ensure rigorous measurement. Our study found that although there is little to no energy impact when logging is enabled for most versions of the studied applications, about 79% (19/24) of the studied applications have at least one version that exhibit medium to large effect sizes in energy consumption when enabling and disabling logging. To further assess the energy impact of logging, we have conducted a controlled experiment with a synthetic application. We found that the rate of logging and the number of disk flushes are significant factors of energy consumption attributable to logging. Finally, we have examined the relation between the generated OS level execution logs and mobile energy consumption. In addition to the common cross-application log events relevant to garbage collection and graphics systems, some mobile applications also have workload-specific log events that are highly correlated with energy consumption. The regression models built with common log events show mixed performance. Mobile application developers do not need to worry about conservative logging (e.g., logs generated at rates of ≤ 1 message per second), as they are not likely to impact energy consumption. Logging has a negligible effect on energy consumption for most of the mobile applications tested. Although logs have been used effectively to diagnose and debug functional problems, it is still an open problem on how to leverage software instrumentation to debug energy problems.},
 accepted = {2018-06-20},
 author = {Shaiful Alam Chowdhury and Silvia Di Nardo and Abram Hindle and Zhen Ming (Jack) Jiang},
 authors = {Shaiful Alam Chowdhury, Silvia Di Nardo, Abram Hindle, Zhen Ming (Jack) Jiang},
 code = {chowdhury2017EMSE-Logging-and-Energy},
 day = {10},
 doi = {10.1007/s10664-017-9545-x},
 funding = {NSERC Discovery},
 journal = {Empirical Software Engineering},
 month = {July},
 pagerange = {1422--1456},
 pages = {1422--1456},
 payurl = {https://doi.org/10.1007/s10664-017-9545-x},
 published = {2018-06-20},
 role = { Researcher / Co-author},
 title = {An exploratory study on assessing the energy impact of logging on Android applications},
 type = {article},
 url = {http://softwareprocess.ca/pubs/chowdhury2017EMSE-Logging-and-Energy.pdf},
 venue = {Empirical Software Engineering},
 year = {2018}
}

GreenScaler: Training Software Energy Models With Automatic Test Generation

Shaiful Chowdhury, Stephanie Borle, Stephen Romansky, Abram Hindle
Empirical Software Engineering,
2018 1--52
PDF

Software energy consumption is a performance related non-functional requirement that complicates building software on mobile devices today. Energy hogging applications (apps) are a liability to both the end-user and software developer. Measuring software energy consumption is non-trivial, requiring both equipment and expertise, yet researchers have found that software energy consumption can be modelled. Prior works have hinted that with more energy measurement data we can make more accurate energy models. This data, however, was expensive to extract because it required energy measurement of running test cases (rare) or time consuming manually written tests. In this paper, we show that automatic random test generation with resource-utilization heuristics can be used successfully to build accurate software energy consumption models. Code coverage, although well-known as a heuristic for generating and selecting tests in traditional software testing, performs poorly at selecting energy hungry tests. We propose an accurate software energy model,GreenScaler, that is built on random tests with CPU-utilization as the test selection heuristic. GreenScaler not only accurately estimates energy consumption for randomly generated tests, but also for meaningful developer written tests. Also, the produced models are very accurate in detecting energy regressions between versions of the same app. This is directly helpful for the app developers who want to know if a change in the source code, for example, is harmful for the total energy consumption. We also show that developers can use GreenScaler to select the most energy efficient API when multiple APIs are available for solving the same problem. Researchers can also use our test generation methodology to further study how to build more accurate software energy models.

@article{chowdhury2018EMSE-GreenScaler,
 abstract = { Software energy consumption is a performance related non-functional requirement that complicates building software on mobile devices today. Energy hogging applications (apps) are a liability to both the end-user and software developer. Measuring software energy consumption is non-trivial, requiring both equipment and expertise, yet researchers have found that software energy consumption can be modelled.  Prior works have hinted that with more energy measurement data we can make more accurate energy models.  This data, however, was expensive to extract because it required energy measurement of running test cases (rare) or time consuming manually written tests.  In this paper, we show that automatic random test generation with resource-utilization heuristics can be used successfully to build accurate software energy consumption models. Code coverage, although well-known as a heuristic for generating and selecting tests in traditional software testing, performs poorly at selecting energy hungry tests.
 We propose an accurate software energy model,GreenScaler, that is built on random tests with CPU-utilization as the test selection heuristic. GreenScaler not only accurately estimates energy consumption for randomly generated tests, but also for meaningful developer written tests.  Also, the produced models are very accurate in detecting energy regressions between versions of the same app.  This is directly helpful for the app developers who want to know if a change in the source code, for example, is harmful for the total energy consumption.  We also show that developers can use GreenScaler to select the most energy efficient API when multiple APIs are available for solving the same problem. Researchers can also use our test generation methodology to further study how to build more accurate software energy models.},
 accepted = {2018-06-20},
 author = {Shaiful Chowdhury and Stephanie Borle and Stephen Romansky and Abram Hindle},
 authors = {Shaiful Chowdhury, Stephanie Borle, Stephen Romansky, Abram Hindle},
 code = {chowdhury2018EMSE-GreenScaler},
 day = {20},
 funding = {NSERC Discovery},
 journal = {Empirical Software Engineering},
 journalid = {EMSE-D-17-00171R2},
 month = {June},
 pagerange = {1--52},
 pages = {1--52},
 published = {2018-06-20},
 role = { Researcher / Co-author},
 title = {GreenScaler: Training Software Energy Models With Automatic Test Generation},
 type = {article},
 url = {http://softwareprocess.ca/pubs/chowdhury2018EMSE-GreenScaler.pdf},
 venue = {Empirical Software Engineering},
 year = {2018}
}

An App Performance Optimization Advisor for Mobile Device App Marketplaces

Rubén Saborido, Foutse Khomh, Abram Hindle, Enrique Alba
Sustainable Computing,
2018 1--18
PDF

On mobile phones, users and developers use apps official marketplaces serving as repositories of apps. The Google Play Store and Apple Store are the official marketplaces of Android and Apple products which offer more than a million apps. Although both repositories offer description of apps, information concerning performance is not available. Due to the constrained hardware of mobile devices, users and developers have to meticulously manage the resources available and they should be given access to performance information about apps. Even if this information was available, the selection of apps would still depend on user preferences and it would require a huge cognitive effort to make optimal decisions. Considering this fact we propose APOA, a recommendation system which can be implemented in any marketplace for helping users and developers to compare apps in terms of performance. APOA uses as input metric values of apps and a set of metrics to optimize. It solves an optimization problem and it generates optimal sets of apps for different user’s context. We show how APOA works over an Android case study. Out of 140 apps, we define typical usage scenarios and we collect measurements of power, CPU, memory, and network usages to demonstrate the benefit of using APOA

@article{saborido2018SUSCOM-app-optimization,
 abstract = {On mobile phones, users and developers use apps official marketplaces serving as repositories of apps. The Google Play Store and Apple Store are the official marketplaces of Android and Apple products which offer more than a million apps. Although both repositories offer description of apps, information concerning performance is not available. Due to the constrained hardware of mobile devices, users and developers have to meticulously manage the resources available and they should be given access to performance information about apps. Even if this information was available, the selection of apps would still depend on user preferences and it would require a huge cognitive effort to make optimal decisions. Considering this fact we propose APOA, a recommendation system which can be implemented in any marketplace for helping users and developers to compare apps in terms of performance. APOA uses as input metric values of apps and a set of metrics to optimize. It solves an optimization problem and it generates optimal sets of apps for different user’s context. We show how APOA works over an Android case study. Out of 140 apps, we define typical usage scenarios and we collect measurements of power, CPU, memory, and network usages to demonstrate the benefit of using APOA},
 accepted = {2018-05-17},
 author = {Rubén Saborido and Foutse Khomh and Abram Hindle and Enrique Alba},
 authors = {Rubén Saborido, Foutse Khomh, Abram Hindle, Enrique Alba},
 code = {saborido2018SUSCOM-app-optimization},
 day = {17},
 funding = {NSERC Discovery},
 journal = {Sustainable Computing},
 journalid = {SUSCOM_2017_322_R1},
 month = {May},
 pagerange = {1--18},
 pages = {1--18},
 published = {2018-05-17},
 role = { Researcher / Co-author},
 title = {An App Performance Optimization Advisor for Mobile Device App Marketplaces},
 type = {article},
 url = {http://softwareprocess.ca/pubs/saborido2018SUSCOM-app-optimization.pdf},
 venue = {Sustainable Computing},
 year = {2018}
}

Training Deep Convolutional Networks with Unlimited Synthesis of Musical Examples for Multiple Instrument Recognition

Rameel Sethi, Noah Weninger, Abram Hindle, Vadim Bulitko, Michael Frishkopf
15th Sound and Music Computing Conference (SMC 2018), Limassol, Cyprus
2018 1--10
PDF

Deep learning has yielded promising results in music information retrieval and other domains compared to machine learning algorithms trained on hand-crafted feature representations, but is often limited by the availability of data and vast hyper-parameter space. It is difficult to obtain large amounts of annotated recordings due to prohibitive labelling costs and copyright restrictions. This is especially true when the MIR task is low-level in nature such as instrument recognition and applied to wide ranges of world instruments, causing most MIR techniques to focus on recovering easily verifiable metadata such as genre. We tackle this data availability problem using two techniques: generation of synthetic recordings using MIDI files and synthesizers, and by adding noise and filters to the generated samples for data augmentation purposes. We investigate the application of deep synthetically trained models to two related low-level MIR tasks of frame-level polyphony detection and instrument classification in polyphonic recordings, and empirically show that deep models trained on synthetic recordings augmented with noise can outperform a majority class baseline on a dataset of polyphonic recordings labeled with predominant instruments.

@inproceedings{sethi2018SMC-synthesis,
 abstract = {Deep learning has yielded promising results in music information retrieval and other domains compared to machine learning algorithms trained on hand-crafted feature representations, but is often limited by the availability of data and vast hyper-parameter space. It is difficult to obtain large amounts of annotated recordings due to prohibitive labelling costs and copyright restrictions. This is especially true when the MIR task is low-level in nature such as instrument recognition and applied to wide ranges of world instruments, causing most MIR techniques to focus on recovering easily verifiable metadata such as genre. We tackle this data availability problem using two techniques: generation of synthetic recordings using MIDI files and synthesizers, and by adding noise and filters to the generated samples for data augmentation purposes. We investigate the application of deep synthetically trained models to two related low-level MIR tasks of frame-level polyphony detection and instrument classification in polyphonic recordings, and empirically show that deep models trained on synthetic recordings augmented with noise can outperform a majority class baseline on a dataset of polyphonic recordings labeled with predominant instruments.},
 accepted = {2018-05-10},
 author = {Rameel Sethi and Noah Weninger and Abram Hindle and Vadim Bulitko and Michael Frishkopf},
 authors = {Rameel Sethi, Noah Weninger, Abram Hindle, Vadim Bulitko, Michael Frishkopf},
 booktitle = {15th Sound and Music Computing Conference (SMC 2018)},
 code = {sethi2018SMC-synthesis},
 date = {2018-05-10},
 funding = {KIAS, NSERC Discovery},
 location = {Limassol, Cyprus},
 pagerange = {1--10},
 pages = {1--10},
 role = { Author},
 title = {Training Deep Convolutional Networks with Unlimited Synthesis of Musical Examples for Multiple Instrument Recognition},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sethi2018SMC-synthesis.pdf},
 venue = {15th Sound and Music Computing Conference (SMC 2018)},
 year = {2018}
}

What can Android mobile app developers do about the energy consumption of machine learning?

Andrea McIntosh, Safwat Hassan, Abram Hindle
Empirical Software Engineering,
2018 1--42
PDF

Machine learning is a popular method of learning functions from data to represent and to classify sensor inputs, multimedia, emails, and calendar events. Smartphone applications have been integrating more and more intelligence in the form of machine learning. Machine learning functionality now appears on most smartphones as voice recognition, spell checking, word disambiguation, face recognition, translation, spatial reasoning, and even natural language summarization. Excited app developers who want to use machine learning on mobile devices face one serious constraint that they did not face on desktop computers or cloud virtual machines: the end-user's mobile device has limited battery life, thus computationally intensive tasks can harm end users' phone availability by draining batteries of their stored energy. Currently, there are few guidelines for developers who want to employ machine learning on mobile devices yet are concerned about software energy consumption of their applications. In this paper, we combine empirical measurements of different machine learning algorithm implementations with complexity theory to provide concrete and theoretically grounded recommendations to developers who want to employ machine learning on smartphones. We conclude that some implementations of algorithms, such as J48, MLP, and SMO, do generally perform better than others in terms of energy consumption and accuracy, and that energy consumption is well-correlated to algorithmic complexity. However, to achieve optimal results a developer must consider their specific application as many factors --- dataset size, number of data attributes, whether the model will require updating, etc. --- affect which machine learning algorithm and implementation will provide the best results.

@article{mcintosh2018EMSE-MLEnergy,
 abstract = {Machine learning is a popular method of learning functions from data to represent and to classify sensor inputs, multimedia, emails, and calendar events. Smartphone applications have been integrating more and more intelligence in the form of machine learning. Machine learning functionality now appears on most smartphones as voice recognition, spell checking, word disambiguation, face recognition, translation, spatial reasoning, and even natural language summarization. Excited app developers who want to use machine learning on mobile devices face one serious constraint that they did not face on desktop computers or cloud virtual machines: the end-user's mobile device has limited battery life, thus computationally intensive tasks can harm end users' phone availability by draining batteries of their stored energy.  Currently, there are few guidelines for developers who want to employ machine learning on mobile devices yet are concerned about software energy consumption of their applications. In this paper, we combine empirical measurements of different machine learning algorithm implementations with complexity theory to provide concrete and theoretically grounded recommendations to developers who want to employ machine learning on smartphones.  We conclude that some implementations of algorithms, such as J48, MLP, and SMO, do generally perform better than others in terms of energy consumption and accuracy, and that energy consumption is well-correlated to algorithmic complexity.  However, to achieve optimal results a developer must consider their specific application as many factors --- dataset size, number of data attributes, whether the model will require updating, etc. --- affect which machine learning algorithm and implementation will provide the best results.},
 accepted = {2018-05-10},
 author = {Andrea McIntosh and Safwat Hassan and Abram Hindle},
 authors = {Andrea McIntosh, Safwat Hassan, Abram Hindle},
 code = {mcintosh2018EMSE-MLEnergy},
 day = {10},
 funding = {NSERC Discovery},
 journal = {Empirical Software Engineering},
 journalid = {EMSE-D-17-00197R2},
 month = {May},
 pagerange = {1--42},
 pages = {1--42},
 published = {2018-05-10},
 role = { Researcher / Co-author},
 title = {What can Android mobile app developers do about the energy consumption of machine learning?},
 type = {article},
 url = {http://softwareprocess.ca/pubs/mcintosh2018EMSE-MLEnergy.pdf},
 venue = {Empirical Software Engineering},
 year = {2018}
}

Syntax and Sensibility: Using language models to detect and correct syntax errors

Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, José Nelson Amaral
25th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018), Campobasso, Italy
2018 1--11
PDF

@inproceedings{eddie2018SANER2018sasulmtdacse,
 accepted = {2017-12-18},
 author = {Eddie Antonio Santos and Joshua Charles Campbell and Dhvani Patel and Abram Hindle and José Nelson Amaral},
 authors = {Eddie Antonio Santos, Joshua Charles Campbell, Dhvani Patel, Abram Hindle, José Nelson Amaral},
 blog = {http://www.eddieantonio.ca/blog/2018/01/15/sensibility/},
 booktitle = {25th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018)},
 date = {2018-04-21},
 funding = {NSERC Discovery, MITACS Accelerate},
 location = {Campobasso, Italy},
 pagerange = {1--11},
 pages = {1--11},
 role = { Author},
 title = {Syntax and Sensibility: Using language models to detect and correct syntax errors},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/santos2018SANER-syntax.pdf},
 venue = {25th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2018)},
 year = {2018}
}

Analyzing the effects of test driven development in GitHub

Neil C. Borle, Meysam Feghhi, Eleni Stroulia, Russell Greiner, Abram Hindle
Empirical Software Engineering,
2017 1--28
PDF
DOI:10.1007/s10664-017-9576-3

Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. To that end, we have conducted a comparative analysis of GitHub repositories that adopts TDD to a lesser or greater extent, in order to determine how TDD affects software development productivity and software quality. We classified GitHub repositories archived in 2015 in terms of how rigorously they practiced TDD, thus creating a TDD spectrum. We then matched and compared various subsets of these repositories on this TDD spectrum with control sets of equal size. The control sets were samples from all GitHub repositories that matched certain characteristics, and that contained at least one test file. We compared how the TDD sets differed from the control sets on the following characteristics: number of test files, average commit velocity, number of bug-referencing commits, number of issues recorded, usage of continuous integration, number of pull requests, and distribution of commits per author. We found that Java TDD projects were relatively rare. In addition, there were very few significant differences in any of the metrics we used to compare TDD-like and non-TDD projects; therefore, our results do not reveal any observable benefits from using TDD.

@article{borle2017EMSE-TDD,
 abstract = {Testing is an integral part of the software development lifecycle, approached with varying degrees of rigor by different process models. Agile process models recommend Test Driven Development (TDD) as a key practice for reducing costs and improving code quality. The objective of this work is to perform a cost-benefit analysis of this practice. To that end, we have conducted a comparative analysis of GitHub repositories that adopts TDD to a lesser or greater extent, in order to determine how TDD affects software development productivity and software quality. We classified GitHub repositories archived in 2015 in terms of how rigorously they practiced TDD, thus creating a TDD spectrum. We then matched and compared various subsets of these repositories on this TDD spectrum with control sets of equal size. The control sets were samples from all GitHub repositories that matched certain characteristics, and that contained at least one test file. We compared how the TDD sets differed from the control sets on the following characteristics: number of test files, average commit velocity, number of bug-referencing commits, number of issues recorded, usage of continuous integration, number of pull requests, and distribution of commits per author. We found that Java TDD projects were relatively rare. In addition, there were very few significant differences in any of the metrics we used to compare TDD-like and non-TDD projects; therefore, our results do not reveal any observable benefits from using TDD.},
 accepted = {2017-11-01},
 author = {Neil C. Borle and Meysam Feghhi and Eleni Stroulia and Russell Greiner and Abram Hindle},
 authors = {Neil C. Borle, Meysam Feghhi, Eleni Stroulia, Russell Greiner, Abram Hindle},
 code = {borle2017EMSE-TDD},
 day = {25},
 doi = {10.1007/s10664-017-9576-3},
 funding = {NSERC Discovery},
 issn = {1573-7616},
 journal = {Empirical Software Engineering},
 journalid = {EMSE-D-17-00057R2},
 month = {Nov},
 pagerange = {1--28},
 pages = {1--28},
 published = {2017-11-25},
 role = { Instructor / Co-author},
 title = {Analyzing the effects of test driven development in GitHub},
 type = {article},
 url = {http://softwareprocess.ca/pubs/borle2017EMSE-TDD.pdf},
 venue = {Empirical Software Engineering},
 year = {2017}
}

Deep Green: An Ensemble of Machine Learning Methods Predicting Mobile Energy Consumption

Stephen Romansky, Shaiful Alam Chowdhury, Abram Hindle, Neil Borle, and Russell Greiner
International Conference on Software Maintenance and Evolution, Shanghai, China
2017 1--11
Acceptance:42/151 or 27.8%
PDF

@inproceedings{romansky2017ICSME-timeseries,
 accepted = {2017-06-12},
 author = {Stephen Romansky and Shaiful Alam Chowdhury and Abram Hindle and Neil Borle and and Russell Greiner},
 authors = {Stephen Romansky, Shaiful Alam Chowdhury, Abram Hindle, Neil Borle, and Russell Greiner},
 booktitle = {International Conference on Software Maintenance and Evolution},
 code = {romansky2017ICSME-timeseries},
 date = {2017-09-20},
 funding = {NSERC Discovery},
 location = {Shanghai, China},
 pages = {1--11},
 rate = {42/151 or 27.8%},
 title = {Deep Green: An Ensemble of Machine Learning Methods Predicting Mobile Energy Consumption},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/romansky2017ICSME-timeseries.pdf},
 venue = {International Conference on Software Maintenance and Evolution},
 year = {2017}
}

Performance with an Electronically Excited Didgeridoo

Abram Hindle, and Daryl Posnett
New Interfaces for Musical Expression (NIME 2017), Copenhagen, Denmark
2017 1--5
PDF

@inproceedings{abram2017NIME2017pwaeed,
 accepted = {2017-04-04},
 author = {Abram Hindle and and Daryl Posnett},
 authors = {Abram Hindle, and Daryl Posnett},
 booktitle = {New Interfaces for Musical Expression (NIME 2017)},
 date = {2017-05-18},
 funding = {NSERC Discovery},
 location = {Copenhagen, Denmark},
 pagerange = {1--5},
 pages = {1--5},
 role = { Author},
 title = {Performance with an Electronically Excited Didgeridoo},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2017NIME-Dijj.pdf},
 venue = {New Interfaces for Musical Expression (NIME 2017)},
 year = {2017}
}

Isolated guitar transcription using a deep belief network

Gregory Burlet, and Abram Hindle
PeerJ Computer Science,
2017 1--30
PDF

@article{burlet2017PeerJ,
 accepted = {2017-03-01},
 author = {Gregory Burlet and and Abram Hindle},
 authors = {Gregory Burlet, and Abram Hindle},
 code = {burlet2017PeerJ},
 funding = {NSERC Discovery},
 issue = {e109},
 journal = {PeerJ Computer Science},
 pagerange = {1--30},
 pages = {1--30},
 published = {2017-03-27},
 role = { Researcher / Co-author},
 title = {Isolated guitar transcription using a deep belief network},
 type = {article},
 url = {http://softwareprocess.ca/pubs/burlet2017PeerJ.pdf},
 venue = {PeerJ Computer Science},
 year = {2017}
}

Detecting duplicate bug reports with software engineering domain knowledge

Karan Aggarwal, and Finbarr Timbers, and Tanner Rutgers, and Abram Hindle, and Eleni Stroulia, and Russell Greiner
Journal of Software: Evolution and Process,
2017 1--15
PDF
DOI:10.1002/smr.1821

@article{aggarwal2017JSEP,
 accepted = {2016-08-01},
 author = {Karan Aggarwal and and Finbarr Timbers and and Tanner Rutgers and and Abram Hindle and and Eleni Stroulia and and Russell Greiner},
 authors = {Karan Aggarwal, and Finbarr Timbers, and Tanner Rutgers, and Abram Hindle, and Eleni Stroulia, and Russell Greiner},
 code = {aggarwal2017JSEP},
 doi = {10.1002/smr.1821},
 doiurl = {http://dx.doi.org/10.1002/smr.1821},
 funding = {NSERC Discovery},
 issn = {2047-7481},
 issue = {3},
 journal = {Journal of Software: Evolution and Process},
 keywords = {deduplication, documentation, duplicate bug reports, information retrieval, machine learning, software engineering textbooks, software literature},
 note = {e1821 smr.1821},
 pagerange = {1--15},
 pages = {1--15},
 published = {2016-10-27},
 role = { Researcher / Co-author},
 title = {Detecting duplicate bug reports with software engineering domain knowledge},
 type = {article},
 url = {http://softwareprocess.ca/pubs/aggarwal2017JSEP.pdf},
 venue = {Journal of Software: Evolution and Process},
 volume = {29},
 year = {2017}
}

Expert Commentary: The potential synthesizer in your pocket

Abram Hindle
A NIME Reader: Fifteen Years of New Interfaces for Musical Expression,
2017 116
PDF

@InBook{abram2017ANIMERFYNIMEectpsiyp,
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {A NIME Reader: Fifteen Years of New Interfaces for Musical Expression},
 editors = {Alexander Refsum Jensenius and Michael J. Lyons},
 funding = {NSERC Discovery},
 pagerange = {116},
 pages = {116},
 publisher = {Springer},
 role = {Author},
 srcurl = {https://github.com/alexarje/A-NIME-Reader/blob/master/latex/mainmatter/Roberts_2013/roberts_2013.tex},
 title = {Expert Commentary: The potential synthesizer in your pocket},
 type = {InBook},
 url = {https://github.com/alexarje/A-NIME-Reader/},
 venue = {A NIME Reader: Fifteen Years of New Interfaces for Musical Expression},
 year = {2017}
}

Continuous Maintenance

Candy Pang and Abram Hindle
International Conference on Software Maintenance and Evolution ERA-Track (ICSME-ERA 2016), Raleigh, United States
2016 1--5
Acceptance:14/41 or 34%
PDF

@inproceedings{candy2016ICSMEERA2016cm,
 accepted = {2017-07-29},
 author = {Candy Pang and Abram Hindle},
 authors = {Candy Pang and Abram Hindle},
 booktitle = {International Conference on Software Maintenance and Evolution ERA-Track (ICSME-ERA 2016)},
 date = {2016-10-02},
 funding = {NSERC Discovery},
 location = {Raleigh, United States},
 pages = {1--5},
 rate = {14/41 or 34%},
 region = {North Carolina},
 title = {Continuous Maintenance},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/pang2016ICSMEERA.pdf},
 venue = {International Conference on Software Maintenance and Evolution ERA-Track (ICSME-ERA 2016)},
 year = {2016}
}

Visualizing Project Evolution Through Abstract Syntax Tree Analysis

Michael D. Feist and Eddie Antonio Santos and Ian Watts and Abram Hindle
Software Visualization (VISSOFT), 2016 IEEE 4th Working Conference on, Raleigh, United States
2016 1--11
Acceptance:21/48 or 43%
PDF

@inproceedings{michael2016VISSOFTvpetasta,
 accepted = {2016-06-10},
 author = {Michael D. Feist and Eddie Antonio Santos and Ian Watts and Abram Hindle},
 authors = {Michael D. Feist and Eddie Antonio Santos and Ian Watts and Abram Hindle},
 booktitle = {Software Visualization (VISSOFT), 2016 IEEE 4th Working Conference on},
 date = {2016-10-01},
 funding = {NSERC Discovery},
 location = {Raleigh, United States},
 pages = {1--11},
 rate = {21/48 or 43%},
 region = {North Carolina},
 title = {Visualizing Project Evolution Through Abstract Syntax Tree Analysis},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/feist2016VISSOFT-syntax-tree.pdf},
 venue = {Software Visualization (VISSOFT), 2016 IEEE 4th Working Conference on},
 year = {2016}
}

Hacking NIMES

Abram Hindle
New Interfaces for Musical Expression (NIME 2016), Brisbane, Australia
2016 1--6
PDF

@inproceedings{abram2016NIME2016hn,
 accepted = {2016-03-28},
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {New Interfaces for Musical Expression (NIME 2016)},
 date = {2016-07-12},
 funding = {NSERC Discovery},
 location = {Brisbane, Australia},
 pagerange = {1--6},
 pages = {1--6},
 role = { Author},
 title = {Hacking NIMES},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2016NIME-hacking-nimes.pdf},
 venue = {New Interfaces for Musical Expression (NIME 2016)},
 year = {2016}
}

Energy Profiles of Java Collections Classes

Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, Abram Hindle
International Conference on Software Engineering (ICSE 2016), Austin, United States
2016 225--236
Acceptance:101/530 or 19%
PDF
Publisher Link
DOI:10.1145/2884781.2884869

ACM SIGSOFT Distinguished Paper Award

@inproceedings{samir2016ICSE2016epojcc,
 accepted = {2015-12-15},
 author = {Samir Hasan and Zachary King and Munawar Hafiz and Mohammed Sayagh and Bram Adams and Abram Hindle},
 authors = {Samir Hasan, Zachary King, Munawar Hafiz, Mohammed Sayagh, Bram Adams, Abram Hindle},
 booktitle = {International Conference on Software Engineering (ICSE 2016)},
 date = {2016-05-14},
 doi = {10.1145/2884781.2884869},
 funding = {NSERC Discovery},
 location = {Austin, United States},
 notes = {ACM SIGSOFT Distinguished Paper Award},
 pagerange = {225--236},
 pages = {225--236},
 payurl = {https://dl.acm.org/citation.cfm?id=2884869},
 rate = {101/530 or 19%},
 region = {Texas},
 role = {Co-author / infrastructure},
 title = {Energy Profiles of Java Collections Classes},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hasan2016ICSE-Energy-Profiles-of-Java-Collections-Classes.pdf},
 venue = {International Conference on Software Engineering (ICSE 2016)},
 year = {2016}
}

GreenOracle: Estimating Software Energy Consumption with Energy Measurement Corpora

Shaiful Chowdhury and Abram Hindle
International Working Conference on Mining Software Repositories (MSR 2016), Austin, United States
2016 49--60
Acceptance:36/103 or 35%
PDF
Publisher Link
DOI:10.1145/2901739.2901763

@inproceedings{shaiful2016MSR2016gesecwemc,
 accepted = {2016-02-29},
 author = {Shaiful Chowdhury and Abram Hindle},
 authors = {Shaiful Chowdhury and Abram Hindle},
 booktitle = {International Working Conference on Mining Software Repositories (MSR 2016)},
 date = {2016-05-14},
 doi = {10.1145/2901739.2901763},
 funding = {NSERC Discovery},
 location = {Austin, United States},
 pagerange = {49--60},
 pages = {49--60},
 payurl = {https://dl.acm.org/citation.cfm?id=2901763},
 rate = {36/103 or 35%},
 region = {Texas},
 role = { Co-author / supervisor},
 title = {GreenOracle: Estimating Software Energy Consumption with Energy Measurement Corpora},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/chowdhury2016MSR-Greenoracle.pdf},
 venue = {International Working Conference on Mining Software Repositories (MSR 2016)},
 year = {2016}
}

The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication

Joshua Charles Campbell, Eddie Antonio Santos and Abram Hindle
International Working Conference on Mining Software Repositories (MSR 2016), Austin, United States
2016 269--280
Acceptance:36/103 or 35%
PDF
Publisher Link
DOI:10.1145/2901739.2901766

@inproceedings{joshua2016MSR2016tueotiricrd,
 author = {Joshua Charles Campbell and Eddie Antonio Santos and Abram Hindle},
 authors = {Joshua Charles Campbell, Eddie Antonio Santos and Abram Hindle},
 booktitle = {International Working Conference on Mining Software Repositories (MSR 2016)},
 date = {2016-05-14},
 doi = {10.1145/2901739.2901766},
 funding = {NSERC Discovery and MITACS Accelerate},
 location = {Austin, United States},
 pagerange = {269--280},
 pages = {269--280},
 payurl = {https://dl.acm.org/citation.cfm?id=2901766},
 rate = {36/103 or 35%},
 region = {Texas},
 role = {Co-author / supervisor},
 title = {The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/campbell2016MSR-partycrasher.pdf},
 venue = {International Working Conference on Mining Software Repositories (MSR 2016)},
 year = {2016}
}

Characterizing Energy-Aware Software Projects: Are They Different?

Shaiful Chowdhury and Abram Hindle
International Working Conference on Mining Software Repositories (MSR 2016), Austin, United States
2016 508--511
Acceptance:10/24 or 42%
PDF
DOI:10.1145/2901739.2903494

@inproceedings{shaiful2016MSR2016cespatd,
 accepted = {2016-03-28},
 author = {Shaiful Chowdhury and Abram Hindle},
 authors = {Shaiful Chowdhury and Abram Hindle},
 booktitle = {International Working Conference on Mining Software Repositories (MSR 2016)},
 date = {2016-05-14},
 doi = {10.1145/2901739.2903494},
 funding = {NSERC Discovery},
 location = {Austin, United States},
 pagerange = {508--511},
 pages = {508--511},
 rate = {10/24 or 42%},
 region = {Texas},
 role = {Co-author / supervisor},
 title = {Characterizing Energy-Aware Software Projects: Are They Different?},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/chowdhuryMSR2016-eProjects.pdf},
 venue = {International Working Conference on Mining Software Repositories (MSR 2016)},
 year = {2016}
}

Judging a commit by its cover: Correlating commit message entropy with build status on Travis-CI

Eddie Antonio Santos and Abram Hindle
International Working Conference on Mining Software Repositories Challenge Track (MSR 2016), Austin, United States
2016 504--507
Acceptance:10/24 or 42%
PDF
Publisher Link

Mining Challenge Award

@inproceedings{eddie2016MSR2016jacbicccmewbsot,
 accepted = {2016-03-28},
 author = {Eddie Antonio Santos and Abram Hindle},
 authors = {Eddie Antonio Santos and Abram Hindle},
 booktitle = {International Working Conference on Mining Software Repositories Challenge Track (MSR 2016)},
 date = {2016-05-14},
 funding = {NSERC Discovery},
 location = {Austin, United States},
 notes = {Mining Challenge Award},
 pagerange = {504--507},
 pages = {504--507},
 payurl = {https://dl.acm.org/citation.cfm?id=2903493},
 publisher = {IEEE},
 rate = {10/24 or 42%},
 region = {Texas},
 role = {Class Project / Co-author},
 title = {Judging a commit by its cover: Correlating commit message entropy with build status on Travis-CI},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/santos2016MSR-judging-a-commit-by-its-cover.pdf},
 venue = {International Working Conference on Mining Software Repositories Challenge Track (MSR 2016)},
 year = {2016}
}

Hadoop energy consumption reduction with hybrid HDFS

Ivanilton Polato, Denilson Barbosa, Abram Hindle, Fabio Kon
Proceedings of the 31st Annual {ACM} Symposium on Applied Computing, April 4-8, 2016, Pisa, Italy
2016 406--411
PDF
Publisher Link
DOI:10.1145/2851613.2851623

@inproceedings{ivanilton2016P31AACMSACA482016hecrwhh,
 author = {Ivanilton Polato and Denilson Barbosa and Abram Hindle and Fabio Kon},
 authors = {Ivanilton Polato, Denilson Barbosa, Abram Hindle, Fabio Kon},
 booktitle = {Proceedings of the 31st Annual {ACM} Symposium on Applied Computing, April 4-8, 2016},
 date = {2016-04-04},
 doi = {10.1145/2851613.2851623},
 funding = {NSERC Discovery},
 location = {Pisa, Italy},
 pagerange = {406--411},
 pages = {406--411},
 payurl = {https://dl.acm.org/citation.cfm?id=2851623},
 title = {Hadoop energy consumption reduction with hybrid HDFS},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/polato2016SAC-hadoop.pdf},
 venue = {Proceedings of the 31st Annual {ACM} Symposium on Applied Computing, April 4-8, 2016},
 year = {2016}
}

Crowdsourced Bug Triaging: Leveraging Q\&A Platforms for Bug Assignment

Ali Sajedi Badashian, Abram Hindle, Eleni Stroulia
International Conference on Fundamental Approaches to Software Engineering (FASE 2016), Eindhoven, The Netherlands
2016 231--248
Acceptance:27%
PDF
Publisher Link
DOI:10.1007/978-3-662-49665-7_14

@inproceedings{ali2016FASE2016cbtlqpfba,
 accepted = {2015-12-18},
 author = {Ali Sajedi Badashian and Abram Hindle and Eleni Stroulia},
 authors = {Ali Sajedi Badashian, Abram Hindle, Eleni Stroulia},
 booktitle = {International Conference on Fundamental Approaches to Software Engineering (FASE 2016)},
 date = {2016-04-02},
 doi = {10.1007/978-3-662-49665-7_14},
 funding = {NSERC Discovery},
 location = {Eindhoven, The Netherlands},
 pagerange = {231--248},
 pages = {231--248},
 payurl = {http://link.springer.com/chapter/10.1007%2F978-3-662-49665-7_14},
 rate = {27%},
 role = {Co-author},
 title = {Crowdsourced Bug Triaging: Leveraging Q\&A Platforms for Bug Assignment},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sajedi2016FASE-crowdsourced-bug-triage.pdf},
 venue = {International Conference on Fundamental Approaches to Software Engineering (FASE 2016)},
 year = {2016}
}

Green Software Engineering: The Curse of Methodology

Abram Hindle
23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016) FOSE Track: Leaders of Tomorrow: Future Of Software Engineering, Osaka, Japan
2016 529--540
PDF
Publisher Link
DOI:10.1109/SANER.2016.60

Invited but peer-reviewed

@inproceedings{abram2016SANER2016gsetcom,
 accepted = {2015-12-22},
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016) FOSE Track: Leaders of Tomorrow: Future Of Software Engineering},
 date = {2016-03-14},
 doi = {10.1109/SANER.2016.60},
 funding = {NSERC Discovery},
 location = {Osaka, Japan},
 notes = {Invited but peer-reviewed},
 pagerange = {529--540},
 pages = {529--540},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7476772},
 role = {Author},
 title = {Green Software Engineering: The Curse of Methodology},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2016SANERFOSE-green-software-engineering.pdf},
 venue = {23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016) FOSE Track: Leaders of Tomorrow: Future Of Software Engineering},
 year = {2016}
}

Client-Side Energy Efficiency of HTTP/2 for Web and Mobile App Developers

Shaiful Chowdhury, Varun Sapra and Abram Hindle
23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016), Osaka, Japan
2016 529--540
Acceptance:52/140 or 37%
PDF
Publisher Link
DOI:10.1109/SANER.2016.77

@inproceedings{shaiful2016SANER2016ceeohfwamad,
 accepted = {2015-12-17},
 author = {Shaiful Chowdhury and Varun Sapra and Abram Hindle},
 authors = {Shaiful Chowdhury, Varun Sapra and Abram Hindle},
 booktitle = {23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016)},
 date = {2016-03-14},
 doi = {10.1109/SANER.2016.77},
 funding = {NSERC Discovery},
 location = {Osaka, Japan},
 pagerange = {529--540},
 pages = {529--540},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7476672},
 rate = {52/140 or 37%},
 role = {Co-author / supervisor},
 title = {Client-Side Energy Efficiency of HTTP/2 for Web and Mobile App Developers},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/chowdhury2016SANER-http2.pdf},
 venue = {23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2016)},
 year = {2016}
}

A contextual approach towards more accurate duplicate bug report detection and ranking

Abram Hindle, Anahita Alipour, Eleni Stroulia
Empirical Software Engineering,
2016 368--410
PDF

@article{hindle2016EMSE,
 accepted = {2015-06-28},
 author = {Abram Hindle and Anahita Alipour and Eleni Stroulia},
 authors = {Abram Hindle, Anahita Alipour, Eleni Stroulia},
 code = {hindle2016EMSE},
 funding = {NSERC Discovery},
 journal = {Empirical Software Engineering},
 pagerange = {368--410},
 pages = {368--410},
 publisher = {Springer},
 role = {Primary Supervisor},
 title = {A contextual approach towards more accurate duplicate bug report detection and ranking},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2016EMSE-bugdedup.pdf},
 venue = {Empirical Software Engineering},
 volume = {21(2)},
 year = {2016}
}

On the Naturalness of Software

Abram Hindle, Earl T. Barr, Zhendong Su, Premkumar T. Devanbu, and Mark Gabel
Communications of the ACM: Invited Research Hilights (CACM),
2016 122--131
PDF

Invited re-print, not peer reviewed

@article{hindle2016CACM,
 accepted = {2015-05-18},
 author = {Abram Hindle and Earl T. Barr and Zhendong Su and Premkumar T. Devanbu and and Mark Gabel},
 authors = {Abram Hindle, Earl T. Barr, Zhendong Su, Premkumar T. Devanbu, and Mark Gabel},
 code = {hindle2016CACM},
 funding = {NSF 0964703 and NSF 0613949},
 issue = {59(5)},
 journal = {Communications of the ACM: Invited Research Hilights (CACM)},
 notes = {Invited re-print, not peer reviewed},
 pagerange = {122--131},
 pages = {122--131},
 role = { Researcher / Co-author},
 title = {On the Naturalness of Software},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2016CACM.pdf},
 venue = {Communications of the ACM: Invited Research Hilights (CACM)},
 year = {2016}
}

Leaders of Tomorrow on the Future of Software Engineering: A Roundtable

Felienne Hermans, Janet Siegmund, Thomas Fritz, Gabriele Bavota, Meiyappan Nagappan, Abram Hindle, Yasutaka Kamei, Ali Mesbah, Bram Adams
IEEE Software,
2016 99--104
PDF
Publisher Link
DOI:10.1109/MS.2016.55

Invited, not peer reviewed.

@article{felienne2016IEEESlototfosear,
 author = {Felienne Hermans and Janet Siegmund and Thomas Fritz and Gabriele Bavota and Meiyappan Nagappan and Abram Hindle and Yasutaka Kamei and Ali Mesbah and Bram Adams},
 authors = {Felienne Hermans, Janet Siegmund, Thomas Fritz, Gabriele Bavota, Meiyappan Nagappan, Abram Hindle, Yasutaka Kamei, Ali Mesbah, Bram Adams},
 doi = {10.1109/MS.2016.55},
 funding = {NSERC Discovery},
 journal = {IEEE Software},
 notes = {Invited, not peer reviewed.},
 pagerange = {99--104},
 pages = {99--104},
 payurl = {http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7420475},
 role = {Invited Opinion},
 title = {Leaders of Tomorrow on the Future of Software Engineering: A Roundtable},
 type = {article},
 url = {https://www.computer.org/csdl/mags/so/2016/02/mso2016020099.pdf},
 venue = {IEEE Software},
 volume = {33(2)},
 year = {2016}
}

The Perils of Energy Mining: Measure a Bunch, Compare just Once

Abram Hindle
Perspectives on Data Science for Software Engineering Software Data,
2016 97--101
PDF

@InBook{abram2016PDSSESDtpoemmabcjo,
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {Perspectives on Data Science for Software Engineering Software Data},
 editors = {Tim Menzies, Laurie Williams, Thomas Zimmermann},
 funding = {NSERC Discovery},
 pagerange = {97--101},
 pages = {97--101},
 publisher = {Morgan Kaufmann},
 role = {Author},
 srcurl = {https://github.com/ds4se/chapters/blob/master/abramhindle/energymining.md},
 title = {The Perils of Energy Mining: Measure a Bunch, Compare just Once},
 type = {InBook},
 url = {http://softwareprocess.ca/pubs/hindle2016D4SE-energymining.pdf},
 venue = {Perspectives on Data Science for Software Engineering Software Data},
 year = {2016}
}

Detecting duplicate bug reports with software engineering domain knowledge

Karan Aggarwal, Tanner Rutgers, Finbarr Timbers, Abram Hindle, Russ Greiner, Eleni Stroulia
22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2015), Montreal, Canada
2015 211--220
Acceptance:46/144 or 32%
PDF
Publisher Link

@inproceedings{karan2015SANER2015ddbrwsedk,
 author = {Karan Aggarwal and Tanner Rutgers and Finbarr Timbers and Abram Hindle and Russ Greiner and Eleni Stroulia},
 authors = {Karan Aggarwal, Tanner Rutgers, Finbarr Timbers, Abram Hindle, Russ Greiner, Eleni Stroulia},
 booktitle = {22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering  (SANER 2015)},
 date = {2016-03-02},
 funding = {NSERC Discovery},
 location = {Montreal, Canada},
 pagerange = {211--220},
 pages = {211--220},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7081831},
 rate = {46/144 or 32% },
 region = {Quebec},
 role = { Co-author / supervisor.},
 title = {Detecting duplicate bug reports with software engineering domain knowledge},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/aggarwal2015SANER-dedup.pdf},
 venue = {22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering  (SANER 2015)},
 year = {2015}
}

A system-call based model of software energy consumption without hardware instrumentation

Shaiful Alam Chowdhury, Luke N. Kumar, Md. Toukir Imam, Mohomed Shazan Mohomed Jabbar, Varun Sapra, Karan Aggarwal, Abram Hindle, Russell Greiner.
Sixth International Green and Sustainable Computing Conference (IGSC 2015), Las Vegas, United States
2015 1--6
Acceptance:24/67 or 36%
PDF
Publisher Link
DOI:10.1109/IGCC.2015.7393719

@inproceedings{shaiful2015IGSC2015asbmosecwhi,
 accepted = {2015-08-06},
 author = {Shaiful Alam Chowdhury and Luke N. Kumar and Md. Toukir Imam and Mohomed Shazan Mohomed Jabbar and Varun Sapra and Karan Aggarwal and Abram Hindle and Russell Greiner.},
 authors = {Shaiful Alam Chowdhury, Luke N. Kumar, Md. Toukir Imam, Mohomed Shazan Mohomed Jabbar, Varun Sapra, Karan Aggarwal, Abram Hindle, Russell Greiner.},
 booktitle = {Sixth International Green and Sustainable Computing Conference (IGSC 2015)},
 date = {2015-12-14},
 doi = {10.1109/IGCC.2015.7393719},
 funding = {NSERC Discovery},
 location = {Las Vegas, United States},
 pagerange = {1--6},
 pages = {1--6},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7393719},
 rate = {24/67 or 36% },
 region = {Nevada},
 role = {Co-author / supervisor},
 title = {A system-call based model of software energy consumption without hardware instrumentation},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/chowdhury2015IGSC-systemcall.pdf},
 venue = {Sixth International Green and Sustainable Computing Conference (IGSC 2015)},
 year = {2015}
}

Hadoop branching: Architectural impacts on energy and performance

Ivanilton Polato, Denilson Barbosa, Abram Hindle, Fabio Kon
Sixth International Green and Sustainable Computing Conference WIP track (IGSC 2015), Las Vegas, United States
2015 406--411
Acceptance:33/67 or 59%
PDF
Publisher Link

@inproceedings{ivanilton2015IGSC2015hbaioeap,
 accepted = {2015-08-06},
 author = {Ivanilton Polato and Denilson Barbosa and Abram Hindle and Fabio Kon},
 authors = {Ivanilton Polato, Denilson Barbosa, Abram Hindle, Fabio Kon},
 booktitle = {Sixth International Green and Sustainable Computing Conference WIP track (IGSC 2015)},
 date = {2015-12-14},
 funding = {NSERC Discovery},
 location = {Las Vegas, United States},
 pagerange = {406--411},
 pages = {406--411},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7393709},
 rate = {33/67 or 59% },
 region = {Nevada},
 role = { Co-author},
 title = {Hadoop branching: Architectural impacts on energy and performance},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/polato2015IGSC-greenmapreduce-4pages.pdf},
 venue = {Sixth International Green and Sustainable Computing Conference WIP track (IGSC 2015)},
 year = {2015}
}

Crowdsourced bug triaging

Ali Sajedi Badashian, Abram Hindle, Eleni Stroulia
International Conference on Software Maintenance and Evolution ERA-Track (ICSME-ERA 2015), Bremen, Germany
2015 506--510
Acceptance:40/210 or 19%
PDF
Publisher Link

@inproceedings{ali2015ICSMEERA2015cbt,
 accepted = {2015-07-24},
 author = {Ali Sajedi Badashian and Abram Hindle and Eleni Stroulia},
 authors = {Ali Sajedi Badashian, Abram Hindle, Eleni Stroulia},
 booktitle = {International Conference on Software Maintenance and Evolution ERA-Track (ICSME-ERA 2015)},
 date = {2015-09-29},
 funding = {NSERC Discovery},
 location = {Bremen, Germany},
 pagerange = {506--510},
 pages = {506--510},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7332503},
 rate = {40/210 or 19%},
 role = { Co-author},
 title = {Crowdsourced bug triaging},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sajedi2015ICSME-ERA-crowd-sourced-bug-triage.pdf},
 venue = {International Conference on Software Maintenance and Evolution ERA-Track (ICSME-ERA 2015)},
 year = {2015}
}

GreenAdvisor: A Tool for Analyzing the Impact of Software Evolution on Energy Consumption

Karan Aggarwal, Abram Hindle and Eleni Stroulia.
International Conference on Software Maintenance and Evolution (ICSME 2015), Bremen, Germany
2015 311--320
Acceptance:32/148 or 22%
PDF
Publisher Link

@inproceedings{karan2015ICSME2015gatfatioseoec,
 author = {Karan Aggarwal and Abram Hindle and Eleni Stroulia.},
 authors = {Karan Aggarwal, Abram Hindle and Eleni Stroulia.},
 booktitle = {International Conference on Software Maintenance and Evolution (ICSME 2015)},
 date = {2015-09-29},
 funding = {NSERC Discovery},
 location = {Bremen, Germany},
 pagerange = {311--320},
 pages = {311--320},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7332477},
 rate = {32/148 or 22% },
 role = { Co-author / supervisor.},
 title = {GreenAdvisor: A Tool for Analyzing the Impact of Software Evolution on Energy Consumption},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/aggarwal2015ICSME-greenadvisor.pdf},
 venue = {International Conference on Software Maintenance and Evolution (ICSME 2015)},
 year = {2015}
}

Orchestrating Your Cloud Orchestra

Abram Hindle
New Interfaces for Musical Expression (NIME 2015), Baton Rogue, United States
2015 1--4
Acceptance:12%
PDF

@inproceedings{abram2015NIME2015oyco,
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {New Interfaces for Musical Expression (NIME 2015)},
 date = {2015-05-31},
 funding = {NSERC Discovery},
 location = {Baton Rogue, United States},
 pagerange = {1--4},
 pages = {1--4},
 rate = {12%},
 region = {Louisiana},
 role = {Author},
 title = {Orchestrating Your Cloud Orchestra},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2015NIME-orchestrating.pdf},
 venue = {New Interfaces for Musical Expression (NIME 2015)},
 year = {2015}
}

An Empirical Study of End-user Programmers in the Computer Music Community

Gregory Burlet, Abram Hindle
Working Conference on Mining Software Repositories (MSR 2015), Florence, Italy
2015 292--302
Acceptance:32/106 or 30%
PDF
Publisher Link

@inproceedings{gregory2015MSR2015aesoepitcmc,
 author = {Gregory Burlet and Abram Hindle},
 authors = {Gregory Burlet, Abram Hindle},
 booktitle = {Working Conference on Mining Software Repositories (MSR 2015)},
 date = {2015-05-16},
 funding = {NSERC Discovery},
 location = {Florence, Italy},
 pagerange = {292--302},
 pages = {292--302},
 payurl = {https://dl.acm.org/citation.cfm?id=2820554&dl=ACM&coll=DL&CFID=798146404&CFTOKEN=89376601},
 rate = {32/106 or 30% },
 role = { Co-author / supervisor},
 title = {An Empirical Study of End-user Programmers in the Computer Music Community},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/burlet2015MSR-music-coders.pdf},
 venue = {Working Conference on Mining Software Repositories (MSR 2015)},
 year = {2015}
}

Mining StackOverflow to Filter out Off-topic IRC Discussion

Shaiful Alam Chowdhury and Abram Hindle
International Working Conference on Mining Software Repositories Challenge Track (MSR 2015), Florence, Italy
2015 422--425
Acceptance:14/21 or 66%
PDF
Publisher Link

Mining challenge award

@inproceedings{shaiful2015MSR2015mstfooid,
 author = {Shaiful Alam Chowdhury and Abram Hindle},
 authors = {Shaiful Alam Chowdhury and Abram Hindle},
 booktitle = {International Working Conference on Mining Software Repositories Challenge Track (MSR 2015)},
 date = {2015-05-16},
 funding = {NSERC Discovery},
 location = {Florence, Italy},
 notes = {Mining challenge award},
 pagerange = {422--425},
 pages = {422--425},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7180108},
 publisher = {IEEE},
 rate = {14/21 or 66%},
 role = {Co-author / supervisor},
 title = {Mining StackOverflow to Filter out Off-topic IRC Discussion},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/chowdhury2015MSR-IRC.pdf},
 venue = {International Working Conference on Mining Software Repositories Challenge Track (MSR 2015)},
 year = {2015}
}

What do programmers know about the energy consumption of software?

Candy Pang, Abram Hindle, Bram Adams, Ahmed E. Hassan
IEEE Software,
2015 83--89
PDF
Publisher Link

@article{candy2015IEEESwdpkatecos,
 author = {Candy Pang and Abram Hindle and Bram Adams and Ahmed E. Hassan},
 authors = {Candy Pang, Abram Hindle, Bram Adams, Ahmed E. Hassan},
 funding = {NSERC Discovery},
 journal = {IEEE Software},
 pagerange = {83--89},
 pages = {83--89},
 payurl = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=7155416},
 role = {Co-author / supervisor},
 title = {What do programmers know about the energy consumption of software?},
 type = {article},
 url = {http://softwareprocess.ca/pubs/pang2015IEEESoftware.pdf},
 venue = {IEEE Software},
 year = {2015}
}

Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data

Joshua Charles Campbell, Abram Hindle, Eleni Stroulia
The Art and Science of Analyzing Software Data,
2015 139--160
PDF

@InBook{joshua2015TASASDldaetfsed,
 author = {Joshua Charles Campbell and Abram Hindle and Eleni Stroulia},
 authors = {Joshua Charles Campbell, Abram Hindle, Eleni Stroulia},
 booktitle = {The Art and Science of Analyzing Software Data},
 editors = {Christian Bird, Tim Menzies, Thomas Zimmermann},
 funding = {NSERC Discovery},
 pagerange = {139--160},
 pages = {139--160},
 publisher = {Morgan Kaufmann},
 role = { Co-author / supervisor},
 title = {Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data},
 type = {InBook},
 url = {http://softwareprocess.ca/pubs/campbell2015AASD-LDA.pdf},
 venue = {The Art and Science of Analyzing Software Data},
 year = {2015}
}

The Impact of User Choice on Energy Consumption

Zhang Chenlei, Abram Hindle, and Daniel M. German
IEEE Software,
2014 69--75
PDF
Publisher Link

@article{zhang2014IEEEStioucoec,
 author = {Zhang Chenlei and Abram Hindle and and Daniel M. German},
 authors = {Zhang Chenlei, Abram Hindle, and Daniel M. German},
 date = {2014/03},
 funding = {NSERC Discovery},
 journal = {IEEE Software},
 pagerange = {69--75},
 pages = {69--75},
 payurl = {https://www.computer.org/csdl/mags/so/2014/03/mso2014030069-abs.html},
 role = {Co-author / supervisor},
 title = {The Impact of User Choice on Energy Consumption},
 type = {article},
 url = {http://softwareprocess.ca/pubs/zhang2014IEEESoftware-user-choice.pdf},
 venue = {IEEE Software},
 year = {2014}
}

The Power of System Call Traces: Predicting the Software Energy Consumption Impact of Changes

Karan Aggarwal, Zhang Chenlei, Joshua Campbell, Abram Hindle, and Eleni Stroulia
24rd Annual Conference of the Center for Advanced Studies (CASCON 2014), Markham, Canada
2014 219--233
Acceptance:18/56 or 32.14%
PDF

@inproceedings{karan2014CASCON2014tposctptsecioc,
 author = {Karan Aggarwal and Zhang Chenlei and Joshua Campbell and Abram Hindle and and Eleni Stroulia},
 authors = {Karan Aggarwal, Zhang Chenlei, Joshua Campbell, Abram Hindle, and Eleni Stroulia},
 booktitle = {24rd Annual Conference of the Center for Advanced Studies (CASCON 2014)},
 date = {2014-11-03},
 funding = {NSERC Discovery},
 location = {Markham, Canada},
 pagerange = {219--233},
 pages = {219--233},
 rate = {18/56 or 32.14% },
 region = {Ontario},
 role = { Co-author / supervisor},
 title = {The Power of System Call Traces: Predicting the Software Energy Consumption Impact of Changes},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/aggarwal2014CASCON-syscalls.pdf},
 venue = {24rd Annual Conference of the Center for Advanced Studies (CASCON 2014)},
 year = {2014}
}

Involvement, Contribution and Influence in Github and Stack Overflow

Ali Sajedi Badashian, Afsaneh Esteki, Ameneh Gholipour, Abram Hindle, and Eleni Stroulia
24rd Annual Conference of the Center for Advanced Studies (CASCON 2014), Markham, Canada
2014 19--33
Acceptance:18/56 or 32.14%
PDF

@inproceedings{ali2014CASCON2014icaiigaso,
 author = {Ali Sajedi Badashian and Afsaneh Esteki and Ameneh Gholipour and Abram Hindle and and Eleni Stroulia},
 authors = {Ali Sajedi Badashian, Afsaneh Esteki, Ameneh Gholipour, Abram Hindle, and Eleni Stroulia},
 booktitle = {24rd Annual Conference of the Center for Advanced Studies (CASCON 2014)},
 date = {2014-11-03},
 funding = {NSERC Discovery},
 location = {Markham, Canada},
 pagerange = {19--33},
 pages = {19--33},
 rate = {18/56 or 32.14% },
 region = {Ontario},
 role = { Co-author / course project},
 title = {Involvement, Contribution and Influence in Github and Stack Overflow},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/sajedi2014CASCON-AAA.pdf},
 venue = {24rd Annual Conference of the Center for Advanced Studies (CASCON 2014)},
 year = {2014}
}

On Improving Green Mining For Energy-Aware Software Analysis

Stephen Romansky, and Abram Hindle
24rd Annual Conference of the Center for Advanced Studies (CASCON 2014), Markham, Canada
2014 234--245
Acceptance:18/56 or 32.14%
PDF

@inproceedings{stephen2014CASCON2014oigmfesa,
 author = {Stephen Romansky and and Abram Hindle},
 authors = {Stephen Romansky, and Abram Hindle},
 booktitle = {24rd Annual Conference of the Center for Advanced Studies (CASCON 2014)},
 date = {2014-11-03},
 funding = {NSERC Discovery},
 location = {Markham, Canada},
 pagerange = {234--245},
 pages = {234--245},
 rate = {18/56 or 32.14% },
 region = {Ontario},
 role = { Co-author / course project},
 title = {On Improving Green Mining For Energy-Aware Software Analysis},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/romansky2014CASCON.pdf},
 venue = {24rd Annual Conference of the Center for Advanced Studies (CASCON 2014)},
 year = {2014}
}

CloudOrch: A Portable SoundCard in the Cloud

Abram Hindle
New Interfaces for Musical Expression (NIME 2014), London, UK
2014 277--280
Acceptance:26/113 or 23.01%
PDF
Publisher Link

@inproceedings{abram2014NIME2014capsitc,
 accepted = {2014-05-08},
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {New Interfaces for Musical Expression (NIME 2014)},
 date = {2014-06-31},
 funding = {NSERC Discovery},
 location = {London, UK},
 pagerange = {277--280},
 pages = {277--280},
 pagesr = {252-261},
 payurl = {http://www.nime.org/proceedings/2014/nime2014_541.pdf},
 published = {2014-05-31},
 rate = {26/113 or 23.01% },
 role = { Author},
 title = {CloudOrch: A Portable SoundCard in the Cloud},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2014NIME-cloudorch.pdf},
 venue = {New Interfaces for Musical Expression (NIME 2014)},
 volume = {11th},
 year = {2014}
}

Green mining: energy consumption of advertisement blocking methods

Kent Rasmussen, Alexandar Wilson, and Abram Hindle
Proceedings of the 3rd International Workshop on Green and Sustainable Software (GREENS 2014),
2014 38--45
PDF

@inproceedings{kent2014GREENS2014gmecoabm,
 author = {Kent Rasmussen and Alexandar Wilson and and Abram Hindle},
 authors = {Kent Rasmussen, Alexandar Wilson, and Abram Hindle},
 booktitle = {Proceedings of the 3rd International Workshop on Green and Sustainable Software (GREENS 2014)},
 date = {2014-06-01},
 funding = {NSERC Discovery},
 pagerange = {38--45},
 pages = {38--45},
 role = { Co-author / supervisor},
 title = {Green mining: energy consumption of advertisement blocking methods},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/rasmussen2014GREENS-adblock.pdf},
 venue = {Proceedings of the 3rd International Workshop on Green and Sustainable Software (GREENS 2014)},
 year = {2014}
}

Co-evolution of project documentation and popularity within github

Karan Aggarwal, Abram Hindle, and Eleni Stroulia
International Working Conference on Mining Software Repositories Challenge Track (MSR 2014), Hyderabad, India
2014 360--363
Acceptance:9/19 or 47.37%
PDF
Publisher Link

@inproceedings{karan2014MSR2014copdapwg,
 author = {Karan Aggarwal and Abram Hindle and and Eleni Stroulia},
 authors = {Karan Aggarwal, Abram Hindle, and Eleni Stroulia},
 booktitle = {International Working Conference on Mining Software Repositories Challenge Track (MSR 2014)},
 date = {2014-05-31},
 funding = {NSERC Discovery},
 location = {Hyderabad, India},
 pagerange = {360--363},
 pages = {360--363},
 payurl = {https://dl.acm.org/citation.cfm?id=2597120},
 rate = {9/19 or 47.37% },
 role = { Co-author / supervisor / course project},
 title = {Co-evolution of project documentation and popularity within github},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/aggarwal2014MSR-documentation.pdf},
 venue = {International Working Conference on Mining Software Repositories Challenge Track (MSR 2014)},
 year = {2014}
}

A green miner's dataset: mining the impact of software change on energy consumption

Zhang Chenlei and Abram Hindle
International Working Conference on Mining Software Data Track (MSR 2014), Hyderabad, India
2014 400--403
Acceptance:15/22 or 68.18%
PDF

@inproceedings{zhang2014MSR2014agmdmtioscoec,
 author = {Zhang Chenlei and Abram Hindle},
 authors = {Zhang Chenlei and Abram Hindle},
 booktitle = {International Working Conference on Mining Software Data Track (MSR 2014)},
 date = {2014-05-31},
 funding = {NSERC Discovery},
 location = {Hyderabad, India},
 pagerange = {400--403},
 pages = {400--403},
 rate = {15/22 or 68.18% },
 role = { Co-author / supervisor},
 title = {A green miner's dataset: mining the impact of software change on energy consumption},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/zhang2014MSR-green-data.pdf},
 venue = {International Working Conference on Mining Software Data Track (MSR 2014)},
 year = {2014}
}

Syntax Errors Just Aren't Natural: Improving Error Reporting with Language Models

Joshua Campbell, Abram Hindle, and J Nelson Amaral
Working Conference on Mining Software Repositories (MSR 2014), Hyderabad, India
2014 252--261
Acceptance:29/85 or 34.12%
PDF
Publisher Link
DOI:10.1145/2597073.2597102

@inproceedings{joshua2014MSR2014sejanierwlm,
 accepted = {2014-04-08},
 author = {Joshua Campbell and Abram Hindle and and J Nelson Amaral},
 authors = {Joshua Campbell, Abram Hindle, and J Nelson Amaral},
 booktitle = {Working Conference on Mining Software Repositories (MSR 2014)},
 date = {2014-05-31},
 doi = {10.1145/2597073.2597102},
 funding = {NSERC Discovery},
 location = {Hyderabad, India},
 pagerange = {252--261},
 pages = {252--261},
 payurl = { http://dl.acm.org/citation.cfm?id=2597073&CFID=342892135&CFTOKEN=59881232},
 published = {2014-05-31},
 publisher = {ACM},
 rate = {29/85 or 34.12% },
 role = { Co-author / supervisor},
 title = {Syntax Errors Just Aren't Natural: Improving Error Reporting with Language Models},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/campbell2014MSR-syntax.pdf},
 venue = {Working Conference on Mining Software Repositories (MSR 2014)},
 year = {2014}
}

GreenMiner: a hardware based mining software repositories software energy consumption framework

Abram Hindle, Alexander Wilson, Kent Rasmussen, Eric Jed Barlow, Joshua Campbell, and Stephen Romansky
International Working Conference on Mining Software Repositories (MSR 2014), Hyderabad, India
2014 12--21
Acceptance:29/89 or 32.58%
PDF

@inproceedings{abram2014MSR2014gahbmsrsecf,
 author = {Abram Hindle and Alexander Wilson and Kent Rasmussen and Eric Jed Barlow and Joshua Campbell and and Stephen Romansky},
 authors = {Abram Hindle, Alexander Wilson, Kent Rasmussen, Eric Jed Barlow, Joshua Campbell, and Stephen Romansky},
 booktitle = {International Working Conference on Mining Software Repositories (MSR 2014)},
 date = {2014-05-31},
 funding = {NSERC Discovery},
 location = {Hyderabad, India},
 pagerange = {12--21},
 pages = {12--21},
 rate = {29/89 or 32.58% },
 role = { Project Lead and Author},
 title = {GreenMiner: a hardware based mining software repositories software energy consumption framework},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2014MSR-greenminer.pdf},
 venue = {International Working Conference on Mining Software Repositories (MSR 2014)},
 year = {2014}
}

Do topics make sense to managers and developers?

Abram Hindle, Christian Bird, Thomas Zimmermann, and Nachiappan Nagappan
Journal of Empirical Software Engineering,
2014 479--515
PDF

@article{abram2014JESEdtmstmad,
 author = {Abram Hindle and Christian Bird and Thomas Zimmermann and and Nachiappan Nagappan},
 authors = {Abram Hindle, Christian Bird, Thomas Zimmermann, and Nachiappan Nagappan},
 funding = {Microsoft Research},
 journal = {Journal of Empirical Software Engineering},
 pagerange = {479--515},
 pages = {479--515},
 payyurl = {http://link.springer.com/article/10.1007%2Fs10664-014-9312-1},
 publisher = {Springer},
 role = {primary author},
 title = {Do topics make sense to managers and developers?},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2014EMSE-topics.pdf},
 venue = {Journal of Empirical Software Engineering},
 year = {2014}
}

A Multidimensional Empirical Study on Refactoring

Nikolaos Tsantalis, Victor Guana, Eleni Stroulia, and Abram Hindle
23rd Annual Conference of the Center for Advanced Studies (CASCON 2013), Markham, Canada
2013 132--146
Acceptance:25/70 or 35.71%
PDF

@inproceedings{nikolaos2013CASCON2013amesor,
 author = {Nikolaos Tsantalis and Victor Guana and Eleni Stroulia and and Abram Hindle},
 authors = {Nikolaos Tsantalis, Victor Guana, Eleni Stroulia, and Abram Hindle},
 booktitle = {23rd Annual Conference of the Center for Advanced Studies (CASCON 2013)},
 date = {2013-11-18},
 funding = {NSERC Discovery},
 location = {Markham, Canada},
 pagerange = {132--146},
 pages = {132--146},
 rate = {25/70 or 35.71% },
 region = {Ontario},
 role = { Supervision and Criticism},
 title = {A Multidimensional Empirical Study on Refactoring},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/tsantalis2013CASCON-refactoring.pdf},
 venue = {23rd Annual Conference of the Center for Advanced Studies (CASCON 2013)},
 year = {2013}
}

On the Personality Traits of StackOverflow Users

Blerina Bazelli, Abram Hindle, Eleni Stroulia
International Conference on Software Maintenance (ICSM-2013 ERA Track), Eindhoven, The Netherlands
2013 460--463
Acceptance:30/70 or 42.86%
PDF

@inproceedings{blerina2013ICSM2013ERATrackotptosu,
 author = {Blerina Bazelli and Abram Hindle and Eleni Stroulia},
 authors = {Blerina Bazelli, Abram Hindle, Eleni Stroulia},
 booktitle = {International Conference on Software Maintenance (ICSM-2013 ERA Track)},
 date = {2013-09-22},
 funding = {NSERC Discovery},
 location = {Eindhoven, The Netherlands},
 pagerange = {460--463},
 pages = {460--463},
 pagesr = {460-463},
 rate = {30/70 or 42.86%},
 role = { Class project / supervisor},
 title = {On the Personality Traits of StackOverflow Users},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/bazelli2013ICSMERA-Personality.pdf},
 venue = {International Conference on Software Maintenance (ICSM-2013 ERA Track)},
 year = {2013}
}

SWARMED: Captive Portals, Mobile Devices, and Audience Participation in Multi-User Music Performance

Abram Hindle
New Interfaces for Musical Expression (NIME 2013), Daejeon and Seoul, Korea Republic
2013 174--179
PDF

@inproceedings{abram2013NIME2013scpmdaapimmp,
 accepted = {2013-04-01},
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {New Interfaces for Musical Expression (NIME 2013)},
 date = {2013-05-27},
 funding = {NSERC Discovery},
 location = {Daejeon and Seoul, Korea Republic},
 pagerange = {174--179},
 pages = {174--179},
 published = {2013-05-27},
 publisher = {NIME},
 role = {Author},
 title = {SWARMED: Captive Portals, Mobile Devices, and Audience Participation in Multi-User Music Performance},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2013NIME-SWARMED.pdf},
 venue = {New Interfaces for Musical Expression (NIME 2013)},
 year = {2013}
}

A contextual approach towards more accurate duplicate bug report detection

Anahita Alipour, Abram Hindle, and Eleni Stroulia
Working Conference on Mining Software Repositories (MSR-2013), San Francisco, United States
2013 183--192
Acceptance:31/81 or 38.27%
PDF
Publisher Link

@inproceedings{anahita2013MSR2013acatmadbrd,
 accepted = {2013-03-15},
 author = {Anahita Alipour and Abram Hindle and and Eleni Stroulia},
 authors = {Anahita Alipour, Abram Hindle, and Eleni Stroulia},
 booktitle = {Working Conference on Mining Software Repositories (MSR-2013)},
 date = {2013-05-18},
 funding = {NSERC Discovery},
 location = {San Francisco, United States},
 pagerange = {183--192},
 pages = {183--192},
 payurl = {http://dl.acm.org/citation.cfm?id=2487123},
 published = {2013-05-18},
 publisher = {IEEE},
 rate = {31/81 or 38.27%},
 region = {California},
 role = { supervisor, author},
 title = {A contextual approach towards more accurate duplicate bug report detection},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/alipour2013MSR-bugdedup.pdf},
 venue = {Working Conference on Mining Software Repositories (MSR-2013)},
 year = {2013}
}

Deficient documentation detection: a methodology to locate deficient project documentation using topic analysis

Joshua Campbell, Zhang Chenlei, Zhen Xu, Abram Hindle, and James Miller
Working Conference on Mining Software Repositories Challenge Track (MSR-2013), San Francisco, United States
2013 57--60
Acceptance:12/30 or 40%
PDF
Publisher Link

@inproceedings{joshua2013MSR2013dddamtldpduta,
 accepted = {2013-03-16},
 author = {Joshua Campbell and Zhang Chenlei and Zhen Xu and Abram Hindle and and James Miller},
 authors = {Joshua Campbell, Zhang Chenlei, Zhen Xu, Abram Hindle, and James Miller},
 booktitle = {Working Conference on Mining Software Repositories Challenge Track (MSR-2013)},
 date = {2013-05-18},
 funding = {NSERC Discovery},
 location = {San Francisco, United States},
 pagerange = {57--60},
 pages = {57--60},
 payurl = {http://dl.acm.org/citation.cfm?id=2487099},
 published = {2013-05-18},
 publisher = {IEEE},
 rate = {12/30 or 40%},
 region = {California},
 role = { Course project / supervisor},
 title = {Deficient documentation detection: a methodology to locate deficient project documentation using topic analysis},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/campbell2013MSR-Deficient.pdf},
 venue = {Working Conference on Mining Software Repositories Challenge Track (MSR-2013)},
 year = {2013}
}

Green Mining: a Methodology of Relating Software Change and Configuration to Power Consumption

Abram Hindle
Journal of Empirical Software Engineering,
2013 374--409
PDF
Publisher Link
DOI:http://dx.doi.org/10.1007/s10664-013-9276-6

@article{abram2013JESEgmamorscactpc,
 author = {Abram Hindle},
 authors = {Abram Hindle},
 doi = {http://dx.doi.org/10.1007/s10664-013-9276-6},
 funding = {NSERC Discovery},
 journal = {Journal of Empirical Software Engineering},
 pagerange = {374--409},
 pages = {374--409},
 payurl = {http://link.springer.com/article/10.1007/s10664-013-9276-6},
 role = {Author.},
 title = {Green Mining: a Methodology of Relating Software Change and Configuration to Power Consumption},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2013EMSE-green-mining.pdf},
 venue = {Journal of Empirical Software Engineering},
 year = {2013}
}

Roundtable: What's Next in Software Analytics

Ahmed E Hassan, Abram Hindle, Per Runeson, Martin Shepperd, Prem Devanbu, and Sunghun Kim
IEEE Software,
2013 53--56
PDF
Publisher Link

Invited, not peer reviewed

@article{ahmed2013IEEESrwnisa,
 author = {Ahmed E Hassan and Abram Hindle and Per Runeson and Martin Shepperd and Prem Devanbu and and Sunghun Kim},
 authors = {Ahmed E Hassan, Abram Hindle, Per Runeson, Martin Shepperd, Prem Devanbu, and Sunghun Kim},
 journal = {IEEE Software},
 notes = {Invited, not peer reviewed},
 pagerange = {53--56},
 pages = {53--56},
 payurl = {https://www.computer.org/csdl/mags/so/2013/04/mso2013040053-abs.html},
 role = {Invited Opinion},
 title = {Roundtable: What's Next in Software Analytics},
 type = {article},
 url = {http://dx.doi.org/10.1109/MS.2013.85},
 venue = {IEEE Software},
 year = {2013}
}

Automated Topic Naming Supporting Cross-project Analysis of Software Maintenance Activities

Abram Hindle, Neil A. Ernst, Michael W. Godfrey, John Mylopoulos
Journal of Empirical Software Engineering,
2013 1125--1155
PDF
Publisher Link

@article{abram2013JESEatnscaosma,
 author = {Abram Hindle and Neil A. Ernst and Michael W. Godfrey and John Mylopoulos},
 authors = {Abram Hindle, Neil A. Ernst, Michael W. Godfrey, John Mylopoulos},
 funding = {NSERC PGS-D and NSERC Discovery},
 journal = {Journal of Empirical Software Engineering},
 pagerange = {1125--1155},
 pages = {1125--1155},
 payurl = {http://link.springer.com/article/10.1007/s10664-012-9209-9},
 role = { Primary author},
 title = {Automated Topic Naming Supporting Cross-project Analysis of Software Maintenance Activities},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2011EMSE-automated-topic-naming.pdf},
 venue = {Journal of Empirical Software Engineering},
 volume = {18(6)},
 year = {2013}
}

Software Bertillonage Determining the Provenance of Software Development Artifacts

Julius Davies, Daniel M. German, Michael W. Godfrey, Abram Hindle
Journal of Empirical Software Engineering,
2012 1195--1237
PDF
Publisher Link

@article{julius2012JESEsbdtposda,
 author = {Julius Davies and Daniel M. German and  Michael W. Godfrey and Abram Hindle},
 authors = {Julius Davies, Daniel M. German,  Michael W. Godfrey, Abram Hindle},
 date = {2012/05},
 funding = {NSERC PGS-D and NSERC Discovery},
 journal = {Journal of Empirical Software Engineering},
 pagerange = {1195--1237},
 pages = {1195--1237},
 payurl = {http://link.springer.com/article/10.1007/s10664-012-9199-7},
 role = {Supporting author, writing, case study},
 title = {Software Bertillonage Determining the Provenance of Software Development Artifacts},
 type = {article},
 url = {http://softwareprocess.ca/pubs/davies2012ESME-Bertillonage.pdf},
 venue = {Journal of Empirical Software Engineering},
 year = {2012}
}

Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs

Dan Han, Zhang Chenlei, Xiachao Fan, Abram Hindle, Kenny Wong, and Eleni Stroulia
Working Conference on Reverse Engineering (WCRE-2012), Kingston, Canada
2012 83--92
Acceptance:43/138 or 31.16%
PDF
Publisher Link

@inproceedings{dan2012WCRE2012uafwtaovb,
 accepted = {2012-08-14},
 author = {Dan Han and Zhang Chenlei and Xiachao Fan and Abram Hindle and Kenny Wong and and Eleni Stroulia},
 authors = {Dan Han, Zhang Chenlei, Xiachao Fan, Abram Hindle, Kenny Wong, and Eleni Stroulia},
 booktitle = {Working Conference on Reverse Engineering (WCRE-2012)},
 date = {2012-10-15},
 funding = {NSERC Discovery},
 location = {Kingston, Canada},
 pagerange = {83--92},
 pages = {83--92},
 payurl = {http://doi.ieeecomputersociety.org/10.1109/WCRE.2012.18},
 published = {2012-10-15},
 publisher = {IEEE},
 rate = {43/138 or 31.16%},
 region = {Ontario},
 role = { Course project / supervisor},
 title = {Understanding Android Fragmentation with Topic Analysis of Vendor-Specific Bugs},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/han2012WCRE-Android-Fragmentation.pdf},
 venue = {Working Conference on Reverse Engineering (WCRE-2012)},
 year = {2012}
}

Relating Requirements to Implementation via Topic Analysis: Do Topics Extracted from Requirements Make Sense to Managers and Developers?

Abram Hindle, Christian Bird, Thomas Zimmermann, and Nachiappan Nagappan
International Conference on Software Maintenance (ICSM 2012), Riva Del Garda, Italy
2012 243--252
Acceptance:46/181 or 25.41%
PDF

@inproceedings{abram2012ICSM2012rrtivtadtefrmstmad,
 accepted = {2012-06-20},
 author = {Abram Hindle and Christian Bird and Thomas Zimmermann and and Nachiappan Nagappan},
 authors = {Abram Hindle, Christian Bird, Thomas Zimmermann, and Nachiappan Nagappan},
 booktitle = {International Conference on Software Maintenance (ICSM 2012)},
 date = {2012-09-23},
 funding = {Microsoft Research},
 location = {Riva Del Garda, Italy},
 pagerange = {243--252},
 pages = {243--252},
 publisher = {IEEE},
 rate = {46/181 or 25.41%},
 role = { Primary Investigator},
 title = {Relating Requirements to Implementation via Topic Analysis: Do Topics Extracted from Requirements Make Sense to Managers and Developers?},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2012ICSM-Relating-Requirements-Topics.pdf},
 venue = {International Conference on Software Maintenance (ICSM 2012)},
 year = {2012}
}

On the Naturalness of Software

Abram Hindle, Earl T. Barr, Zhendong Su, Premkumar T. Devanbu, and Mark Gabel
International Conference on Software Engineering (ICSE-2012), Zurich, Switzerland
2012 837--847
Acceptance:87/408 or 21.32%
PDF

@inproceedings{hindle12012ICSE,
 accepted = {2012-01-27},
 author = {Abram Hindle and Earl T. Barr and Zhendong Su and Premkumar T. Devanbu and and Mark Gabel},
 authors = {Abram Hindle, Earl T. Barr, Zhendong Su, Premkumar T. Devanbu, and Mark Gabel},
 booktitle = {International Conference on Software Engineering (ICSE-2012)},
 code = {hindle12012ICSE},
 date = {2012-06-02},
 funding = {NSF 0964703 and NSF 0613949},
 location = {Zurich, Switzerland},
 pagerange = {837--847},
 pages = {837--847},
 published = {2012-06-02},
 publisher = {IEEE},
 rate = {87/408 or 21.32%},
 role = { Researcher / author},
 title = {On the Naturalness of Software},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2012ICSE.pdf},
 venue = {International Conference on Software Engineering (ICSE-2012)},
 year = {2012}
}

Green Mining: A Methodology of Relating Software Change to Power Consumption

Abram Hindle
Working Conference on Mining Software Repositories (MSR-2012), Zurich, Switzerland
2012 78--87
Acceptance:18/64 or 28.13%
PDF

MSR Distinguished/Best Paper Award

@inproceedings{abram2012MSR2012gmamorsctpc,
 accepted = {2012-03-16},
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {Working Conference on Mining Software Repositories (MSR-2012)},
 date = {2012-06-02},
 funding = {NSERC Discovery},
 location = {Zurich, Switzerland},
 notes = {MSR Distinguished/Best Paper Award},
 oldurl = {http://softwareprocess.es/a/green-change-web.pdf},
 pagerange = {78--87},
 pages = {78--87},
 published = {2012-06-02},
 rate = {18/64 or 28.13%},
 role = { Author},
 title = {Green Mining: A Methodology of Relating Software Change to Power Consumption},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2012MSR-Green-Mining.pdf},
 venue = {Working Conference on Mining Software Repositories (MSR-2012)},
 year = {2012}
}

Green Mining: Investigating Power Consumption across Versions

Abram Hindle
International Conference on Software Engineering - NIER Track. (ICSE-NIER 2012), Zurich, Switzerland
2012 1301--1304
Acceptance:26/147 or 17.69%
PDF

@inproceedings{abram2012ICSENIER2012gmipcav,
 accepted = {2012-01-27},
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {International Conference on Software Engineering - NIER Track. (ICSE-NIER 2012)},
 date = {2012-06-02},
 funding = {NSERC Discovery},
 location = {Zurich, Switzerland},
 oldurl = {http://softwareprocess.es/a/green-nier-web.pdf},
 pagerange = {1301--1304},
 pages = {1301--1304},
 published = {2012-06-02},
 rate = {26/147 or 17.69%},
 role = { Author},
 title = {Green Mining: Investigating Power Consumption across Versions},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2012ICSENEIR-Green-Mining.pdf},
 venue = {International Conference on Software Engineering - NIER Track. (ICSE-NIER 2012)},
 year = {2012}
}

Do the stars align? Multidimensional analysis of Android's Layered Architecture

Victor Guana, Fabio De Pinho Rocha, Abram Hindle, and Eleni Stroulia
Working Conference on Mining Software Repositories: Challenge Track (MSR-2012), Zurich, Switzerland
2012 124--127
Acceptance:6/17 or 35.29%
PDF
Publisher Link
DOI:10.1109/MSR.2012.6224269

Mining Challenge Award

@inproceedings{victor2012MSR2012dtsamaoala,
 accepted = {2012-03-12},
 author = {Victor Guana and Fabio De Pinho Rocha and Abram Hindle and and Eleni Stroulia},
 authors = {Victor Guana, Fabio De Pinho Rocha, Abram Hindle, and Eleni Stroulia},
 booktitle = {Working Conference on Mining Software Repositories: Challenge Track (MSR-2012)},
 date = {2012-06-02},
 doi = {10.1109/MSR.2012.6224269},
 funding = {NSERC Discovery},
 location = {Zurich, Switzerland},
 notes = {Mining Challenge Award},
 pagerange = {124--127},
 pages = {124--127},
 payurl = {http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6224269&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F6220358%2F6224266%2F06224269.pdf%3Farnumber%3D6224269},
 published = {2012-06-02},
 publisher = {IEEE},
 rate = {6/17 or 35.29%},
 role = {Course project supervisor / author},
 title = {Do the stars align? Multidimensional analysis of Android's Layered Architecture},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/guana2012MSR-Stars.pdf},
 venue = {Working Conference on Mining Software Repositories: Challenge Track (MSR-2012)},
 year = {2012}
}

The Build Dependency Perspective of Android's Concrete Architecture

Wei Hu, Dan Han, Abram Hindle, and Kenny Wong
Working Conference on Mining Software Repositories Challenge Track (MSR 2012), Zurich, Switzerland
2012 128--131
Acceptance:6/17 or 35.29%
PDF
DOI:10.1109/MSR.2012.6224270

@inproceedings{wei2012MSR2012tbdpoaca,
 author = {Wei Hu and Dan Han and Abram Hindle and and Kenny Wong},
 authors = {Wei Hu, Dan Han, Abram Hindle, and Kenny Wong},
 booktitle = {Working Conference on Mining Software Repositories Challenge Track (MSR 2012)},
 date = {2012-06-02},
 doi = {10.1109/MSR.2012.6224270},
 funding = {NSERC Discovery},
 location = {Zurich, Switzerland},
 pagerange = {128--131},
 pages = {128--131},
 publisher = {IEEE Computer Society},
 rate = {6/17 or 35.29%},
 role = {course project supervisor / author},
 title = {The Build Dependency Perspective of Android's Concrete Architecture},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hu2012MSR-builddeps.pdf},
 venue = {Working Conference on Mining Software Repositories Challenge Track (MSR 2012)},
 year = {2012}
}

Cohesive and Isolated Development with Branches

Earl T. Barr, Christian Bird, Peter C. Rigby, Abram Hindle, Daniel M. German, and Premkumar T. Devanbu
Fundamental Approaches to Software Engineering (FASE 2012), Tallinn, Estonia
2012 316--331
Acceptance:33/134 or 24.63%
PDF

@inproceedings{earl2012FASE2012caidwb,
 author = {Earl T. Barr and Christian Bird and Peter C. Rigby and Abram Hindle and Daniel M.  German and and Premkumar T. Devanbu},
 authors = {Earl T. Barr, Christian Bird, Peter C. Rigby, Abram Hindle, Daniel M.  German, and Premkumar T. Devanbu},
 booktitle = {Fundamental Approaches to Software Engineering (FASE 2012)},
 date = {2012-03-24},
 funding = {NSF},
 location = {Tallinn, Estonia},
 pagerange = {316--331},
 pages = {316--331},
 rate = {33/134 or 24.63% },
 role = {Editing and some experiments},
 title = {Cohesive and Isolated Development with Branches},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/barr2012cid.pdf},
 venue = {Fundamental Approaches to Software Engineering (FASE 2012)},
 year = {2012}
}

Got Issues? Do New Features and Code Improvements Affect Defects?

Daryl Posnett, Abram Hindle, Premkumar Devanbu
Proc. of 2011 Working Conference on Reverse Engineering (WCRE-11), Limerick, Ireland
2011 211--215
Acceptance:22+27/104 or 48%
PDF

@inproceedings{daryl2011WCRE11gidnfaciad,
 author = {Daryl Posnett and Abram Hindle and Premkumar Devanbu},
 authors = {Daryl Posnett, Abram Hindle, Premkumar Devanbu},
 booktitle = {Proc. of 2011 Working Conference on Reverse Engineering (WCRE-11)},
 date = {2011-10-17},
 funding = {NSF},
 location = {Limerick, Ireland},
 pagerange = {211--215},
 pages = {211--215},
 rate = {22+27/104 or 48%},
 role = { Co-author},
 title = {Got Issues? Do New Features and Code Improvements Affect Defects?},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/posnett2011WCRE-Got-Issues.pdf},
 venue = {Proc. of 2011 Working Conference on Reverse Engineering (WCRE-11)},
 year = {2011}
}

On the Effectiveness of Simhashing in Clone Detection on Large Scale Software System

Sharif Uddin, Chanchal K. Roy, Kevin A. Schneider and Abram Hindle
Proc. of 2011 Working Conference on Reverse Engineering (WCRE-11), Limerick, Ireland
2011 13-22
Acceptance:22/104 or 21%
PDF

@inproceedings{sharif2011WCRE11oteosicdolsss,
 author = {Sharif Uddin and Chanchal K. Roy and Kevin A. Schneider and Abram Hindle},
 authors = {Sharif Uddin, Chanchal K. Roy, Kevin A. Schneider and Abram Hindle},
 booktitle = {Proc. of 2011 Working Conference on Reverse Engineering (WCRE-11)},
 date = {2011-10-17},
 funding = {NSERC PGS-D and NSERC Discovery},
 location = {Limerick, Ireland},
 pagerange = {13-22},
 pages = {13-22},
 rate = {22/104 or 21%},
 role = {Initial idea, editing},
 title = {On the Effectiveness of Simhashing in Clone Detection on Large Scale Software System},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/uddin2011WCRE-simhash.pdf},
 venue = {Proc. of 2011 Working Conference on Reverse Engineering (WCRE-11)},
 year = {2011}
}

BugCache for Inspections : Hit or Miss?

Foyzur Rahman, Daryl Posnett, Abram Hindle, Earl Barr, Premkumar Devanbu
Proceedings of FSE 2011 (FSE-11), Szeged, Hungary
2011 322--331
Acceptance:34/203 or 16.7%
PDF

@inproceedings{foyzur2011FSE11bfi:hom,
 author = {Foyzur Rahman and Daryl Posnett and Abram Hindle and Earl Barr and Premkumar Devanbu},
 authors = {Foyzur Rahman, Daryl Posnett, Abram Hindle, Earl Barr, Premkumar Devanbu},
 booktitle = {Proceedings of FSE 2011 (FSE-11)},
 date = {2011-09-05},
 funding = {NSF},
 location = {Szeged, Hungary},
 pagerange = {322--331},
 pages = {322--331},
 rate = {34/203 or 16.7% },
 role = {Co-author, editting, some programming.},
 title = {BugCache for Inspections : Hit or Miss?},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/rahman2011FSE-bugcache.pdf},
 venue = {Proceedings of FSE 2011 (FSE-11)},
 year = {2011}
}

Determining the provenance of software artifacts

Michael Godfrey, Julius Davis, Daniel German and Abram Hindle
Fifth International Workshop on Software Clones, Waikiki, United States
2011 65--66
PDF

@inproceedings{michael2011FIWSCdtposa,
 author = {Michael Godfrey and Julius Davis and Daniel German and Abram Hindle},
 authors = {Michael Godfrey, Julius Davis, Daniel German and Abram Hindle},
 booktitle = {Fifth International Workshop on Software Clones},
 date = {2011-05-23},
 location = {Waikiki, United States},
 pagerange = {65--66},
 pages = {65--66},
 region = {Hawaii},
 role = {Co-author, Editing},
 title = {Determining the provenance of software artifacts},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/godfrey2011IWSC-provenance.pdf},
 venue = {Fifth International Workshop on Software Clones},
 year = {2011}
}

Automated topic naming to support cross-project analysis of software maintenance activities

Abram Hindle, Neil Ernst, Michael M. Godfrey, John Mylopoulos
Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11), Waikiki, United States
2011 163--172
Acceptance:20/61 or 33%
PDF

@inproceedings{abram2011MSR11atntscaosma,
 author = {Abram Hindle and Neil Ernst and Michael M. Godfrey and John Mylopoulos},
 authors = {Abram Hindle, Neil Ernst, Michael M. Godfrey, John Mylopoulos},
 booktitle = {Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11)},
 date = {2011-05-21},
 funding = {NSERC},
 location = {Waikiki, United States},
 pagerange = {163--172},
 pages = {163--172},
 rate = {20/61 or 33% },
 region = {Hawaii},
 role = { Co-author},
 title = {Automated topic naming to support cross-project analysis of software maintenance activities},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2011MSR-topicnaming.pdf},
 venue = {Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11)},
 year = {2011}
}

Software Bertillonage: Finding the provenance of an entity

Julius Davies, Michael Godfrey and Daniel German, Abram Hindle
Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11), Waikiki, United States
2011 183--192
Acceptance:20/61 or 33%
PDF

@inproceedings{julius2011MSR11sbftpoae,
 author = {Julius Davies and Michael Godfrey and Daniel German and Abram Hindle},
 authors = {Julius Davies, Michael Godfrey and Daniel German, Abram Hindle},
 booktitle = {Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11)},
 date = {2011-05-21},
 funding = {NSERC PGS-D},
 location = {Waikiki, United States},
 pagerange = {183--192},
 pages = {183--192},
 rate = {20/61 or 33% },
 region = {Hawaii},
 role = {Editing, Co-author},
 title = {Software Bertillonage: Finding the provenance of an entity},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/davies2011MSR-bertillonage.pdf},
 venue = {Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11)},
 year = {2011}
}

A Simpler Model of Software Readability

Daryl Posnett, Abram Hindle and Premkumar Devanbu
Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11), Waikiki, United States
2011 73--82
Acceptance:20/61 or 33%
PDF

@inproceedings{daryl2011MSR11asmosr,
 author = {Daryl Posnett and Abram Hindle and Premkumar Devanbu},
 authors = {Daryl Posnett, Abram Hindle and Premkumar Devanbu},
 booktitle = {Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11)},
 date = {2011-05-21},
 funding = {NSF},
 location = {Waikiki, United States},
 pagerange = {73--82},
 pages = {73--82},
 rate = {20/61 or 33%},
 region = {Hawaii},
 role = { Co-author},
 title = {A Simpler Model of Software Readability},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/posnett2011MSR-readability.pdf},
 venue = {Proc. of 2011 Working Conference on Mining Software Repositories (MSR-11)},
 year = {2011}
}

Multifractal Aspects of Software Development

Abram Hindle, Michael M. Godfrey, Richard C. Holt
33rd International Conference on Software Engineering ICSE Companion, ICSE-11 special track on New Ideas and Emerging Results (NIER), Waikiki, United States
2011 968--971
Acceptance:46/196 or 23%
PDF

@inproceedings{abram2011NIERmaosd,
 author = {Abram Hindle and Michael M. Godfrey and Richard C. Holt},
 authors = {Abram Hindle, Michael M. Godfrey, Richard C. Holt},
 booktitle = {33rd International Conference on Software Engineering ICSE Companion, ICSE-11 special track on New Ideas and Emerging Results (NIER)},
 date = {2011-05-21},
 funding = {NSERC PGS-D},
 location = {Waikiki, United States},
 pagerange = {968--971},
 pages = {968--971},
 rate = {46/196 or 23%},
 region = {Hawaii},
 role = { Co-author},
 title = {Multifractal Aspects of Software Development},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2011ICSENIER-Multifractal.pdf},
 venue = {33rd International Conference on Software Engineering ICSE Companion, ICSE-11 special track on New Ideas and Emerging Results (NIER)},
 year = {2011}
}

Software Process Recovery: Recovering Process From Artifacts

Abram Hindle
Doctoral Symposium of the 17th Working Conference on Reverse Engineering 2010 (WCRE-10), Boston, United States
2010 305--308
PDF

@inproceedings{abram2010WCRE10sprrpfa,
 author = {Abram Hindle},
 authors = {Abram Hindle},
 booktitle = {Doctoral Symposium of the 17th Working Conference on Reverse Engineering 2010 (WCRE-10)},
 date = {2010-10-13},
 funding = {NSERC PGS-D},
 location = {Boston, United States},
 pagerange = {305--308},
 pages = {305--308},
 region = {Massachusetts},
 role = {Author},
 title = {Software Process Recovery: Recovering Process From Artifacts},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2010WCRE-SoftwareProcessRecovery.pdf},
 venue = {Doctoral Symposium of the 17th Working Conference on Reverse Engineering 2010 (WCRE-10)},
 year = {2010}
}

Software Process Recovery using Recovered Unified Process Views

Abram Hindle, Michael M. Godfrey, Richard C. Holt
Proc. of 2010 International Conference on Software Maintenance (ICSM-10), Timisoara, Romania
2010 1--10
Acceptance:of 36/133 or 26%
PDF

@inproceedings{abram2010ICSM10sprurupv,
 author = {Abram Hindle and Michael M. Godfrey and Richard C. Holt},
 authors = {Abram Hindle, Michael M. Godfrey, Richard C. Holt},
 booktitle = {Proc. of 2010 International Conference on Software Maintenance (ICSM-10)},
 date = {2010-09-12},
 location = {Timisoara, Romania},
 pagerange = {1--10},
 pages = {1--10},
 rate = {of 36/133 or 26% },
 role = { Co-author},
 title = {Software Process Recovery using Recovered Unified Process Views},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2010ICSME-rupv.pdf},
 venue = {Proc. of 2010 International Conference on Software Maintenance (ICSM-10)},
 year = {2010}
}

Mining Challenge 2010: FreeBSD, GNOME Desktop and Debian/Ubuntu

Abram Hindle, Israel Herraiz, Emad Shihab, and Zhen Ming Jiang
Proc. of 2010 Working Conference on Mining Software Repositories (MSR-10), Cape Town, South Africa
2010 82--85
PDF

Un-refereed, as I was the challenge chair

@inproceedings{abram2010MSR10mc2fgdad,
 author = {Abram Hindle and Israel Herraiz and Emad Shihab and and Zhen Ming Jiang},
 authors = {Abram Hindle, Israel Herraiz, Emad Shihab, and Zhen Ming Jiang},
 booktitle = {Proc. of 2010 Working Conference on Mining Software Repositories (MSR-10)},
 date = {2010-05-02},
 dateconf = {2010-05-02},
 funding = {NSERC PGS-D},
 location = {Cape Town, South Africa},
 notes = {Un-refereed, as I was the challenge chair},
 pagerange = {82--85},
 pages = {82--85},
 refereed = {No},
 role = {Challenge Track Chair},
 title = {Mining Challenge 2010: FreeBSD, GNOME Desktop and Debian/Ubuntu},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2010MSR-Challenge-Description.pdf},
 venue = {Proc. of 2010 Working Conference on Mining Software Repositories (MSR-10)},
 year = {2010}
}

What's Hot and What's Not: Windowing Developer Topic Analysis

Abram Hindle, Michael W. Godfrey, Richard C. Holt
Proc. of 2009 IEEE Conference on Software Maintenance (ICSM-09), Edmonton, Canada
2009 339--348
Acceptance:35/162 or 22%
PDF

@inproceedings{abram2009ICSM09whawnwdta,
 author = {Abram Hindle and Michael W. Godfrey and Richard C. Holt},
 authors = {Abram Hindle, Michael W. Godfrey, Richard C. Holt},
 booktitle = {Proc. of 2009 IEEE Conference on Software Maintenance (ICSM-09)},
 date = {2009-09-20},
 dateconf = {2009-09-20},
 funding = {NSERC PGS-D},
 location = {Edmonton, Canada},
 mydate = {20--26 September 2009},
 pagerange = {339--348},
 pages = {339--348},
 rate = {35/162 or 22%},
 region = {Alberta},
 role = { Co-author},
 title = {What's Hot and What's Not: Windowing Developer Topic Analysis},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2009ICSM-whats-hot.pdf},
 venue = {Proc. of 2009 IEEE Conference on Software Maintenance (ICSM-09)},
 year = {2009}
}

Automatic Classification of Large Changes into Maintenance Categories

Abram Hindle, Daniel M. German, Michael W. Godfrey, and Richard C. Holt
Proc. of 2009 IEEE Intl. Conference on Program Comprehension (ICPC-09), Vancouver, Canada
2009 30--39
Acceptance:20/74 or 27%
PDF

@inproceedings{abram2009ICPC09acolcimc,
 author = {Abram Hindle and Daniel M. German and Michael W. Godfrey and and Richard C. Holt},
 authors = {Abram Hindle, Daniel M. German, Michael W. Godfrey, and Richard C. Holt},
 booktitle = {Proc. of 2009 IEEE Intl. Conference on Program Comprehension (ICPC-09)},
 date = {2009-05-19},
 dateconf = {2009-05-19},
 funding = {NSERC PGS-D},
 location = {Vancouver, Canada},
 pagerange = {30--39},
 pages = {30--39},
 rate = {20/74 or 27%},
 region = {British Columbia},
 role = { Co-author},
 title = {Automatic Classification of Large Changes into Maintenance Categories},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindleICPC2009-large-changes-classification.pdf},
 venue = {Proc. of 2009 IEEE Intl. Conference on Program Comprehension (ICPC-09)},
 year = {2009}
}

Mining Recurrent Activities: Fourier Analysis of Change Events

Abram Hindle, Michael W. Godfrey, Richard C. Holt
31st International Conference on Software Engineering ICSE Companion, ICSE-09 special track on New Ideas and Emerging Results (NIER), Vancouver, Canada
2009 295--298
Acceptance:21/118 or 15%
PDF

@inproceedings{abram2009NIERmrafaoce,
 author = {Abram Hindle and Michael W. Godfrey and Richard C. Holt},
 authors = {Abram Hindle, Michael W. Godfrey, Richard C. Holt},
 booktitle = {31st International Conference on Software Engineering ICSE Companion, ICSE-09 special track on New Ideas and Emerging Results (NIER)},
 date = {2009-05-16},
 funding = {NSERC PGS-D},
 location = {Vancouver, Canada},
 pagerange = {295--298},
 pages = {295--298},
 rate = {21/118 or 15%},
 region = {British Columbia},
 role = { Co-author},
 title = {Mining Recurrent Activities: Fourier Analysis of Change Events},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2009ICSENIER-fourier-analysis.pdf},
 venue = {31st International Conference on Software Engineering ICSE Companion, ICSE-09 special track on New Ideas and Emerging Results (NIER)},
 year = {2009}
}

Reading beside the lines: Using indentation to rank revisions by complexity

Abram Hindle, Michael W. Godfrey, Richard C. Holt
Science of Computer Programming,
2009 414--429
PDF
Publisher Link

@article{abram2009SCPrbtluitrrbc,
 author = {Abram Hindle and Michael W. Godfrey and Richard C. Holt},
 authors = {Abram Hindle, Michael W. Godfrey, Richard C. Holt},
 funding = {NSERC PGS-D},
 journal = {Science of Computer Programming},
 pagerange = {414--429},
 pages = {414--429},
 payurl = {http://www.sciencedirect.com/science/article/pii/S0167642309000379},
 role = { Primary author},
 title = {Reading beside the lines: Using indentation to rank revisions by complexity},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2009SCP-Reading-beside-the-lines.pdf},
 venue = {Science of Computer Programming},
 volume = {74(7)},
 year = {2009}
}

Reverse Engineering CAPTCHAs

Abram Hindle, Michael W. Godfrey, and Richard C. Holt
Proc. of the 2008 Working Conference on Reverse Engineering (WCRE-08), Antwerp, Belgium
2008 59--68
Acceptance:20/70 or 29%
PDF

@inproceedings{abram2008WCRE08rec,
 author = {Abram Hindle and Michael W. Godfrey and and Richard C. Holt},
 authors = {Abram Hindle, Michael W. Godfrey, and Richard C. Holt},
 booktitle = {Proc. of the 2008 Working Conference on Reverse Engineering (WCRE-08)},
 date = {2008-10-15},
 funding = {NSERC PGS-D},
 location = {Antwerp, Belgium},
 pagerange = {59--68},
 pages = {59--68},
 rate = {20/70 or 29%},
 role = { Co-author},
 title = {Reverse Engineering CAPTCHAs},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2008WCRE-captcha.pdf},
 venue = {Proc. of the 2008 Working Conference on Reverse Engineering (WCRE-08)},
 year = {2008}
}

From Indentation Shapes to Code Structures

Abram Hindle, Michael W. Godfrey, and Richard C. Holt
8th IEEE Intl. Working Conference on Source Code Analysis and Manipulation (SCAM 2008), Beijing, China
2008 111--120
Acceptance:23/61 or 38%
PDF

@inproceedings{abram2008SCAM2008fistcs,
 author = {Abram Hindle and Michael W. Godfrey and and Richard C. Holt},
 authors = {Abram Hindle, Michael W. Godfrey, and Richard C. Holt},
 booktitle = {8th IEEE Intl. Working Conference on Source Code Analysis and Manipulation (SCAM 2008)},
 date = {2008-09-28},
 dateconf = {2008-09-28},
 funding = {NSERC PGS-D},
 location = {Beijing, China},
 pagerange = {111--120},
 pages = {111--120},
 rate = {23/61 or 38%},
 role = { Co-author},
 title = {From Indentation Shapes to Code Structures},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2008SCAM-indentation-shape.pdf},
 venue = {8th IEEE Intl. Working Conference on Source Code Analysis and Manipulation (SCAM 2008)},
 year = {2008}
}

Reading Beside the Lines: Indentation as a Proxy for Complexity Metrics

Abram Hindle, Michael W. Godfrey, and Richard C. Holt
Proc. of 2008 IEEE Intl. Conference on Program Comprehension (ICPC-08), Amsterdam, The Netherlands
2008 133--142
Acceptance:38%
PDF

@inproceedings{abram2008ICPC08rbtliaapfcm,
 author = {Abram Hindle and Michael W. Godfrey and and Richard C. Holt},
 authors = {Abram Hindle, Michael W. Godfrey, and Richard C. Holt},
 booktitle = {Proc. of 2008 IEEE Intl. Conference on Program Comprehension (ICPC-08)},
 date = {2008-06-10},
 funding = {NSERC PGS-D},
 location = {Amsterdam, The Netherlands},
 pagerange = {133--142},
 pages = {133--142},
 rate = {38%},
 role = { Co-author},
 title = {Reading Beside the Lines: Indentation as a Proxy for Complexity Metrics},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2008ICPC-reading-beside-the-lines.pdf},
 venue = {Proc. of 2008 IEEE Intl. Conference on Program Comprehension (ICPC-08)},
 year = {2008}
}

What do large commits tell us?: A taxonomical study of large commits

Abram Hindle, Daniel M. German, Richard C. Holt
Proc. of the 2008 Working Conference on Mining Software Repositories (MSR-08), Leipzig, Germany
2008 99--108
Acceptance:8/42 or 19%
PDF

@inproceedings{abram2008MSR08wdlctuatsolc,
 author = {Abram Hindle and Daniel M. German and Richard C. Holt},
 authors = {Abram Hindle, Daniel M. German, Richard C. Holt},
 booktitle = {Proc. of the 2008 Working Conference on Mining Software Repositories (MSR-08)},
 date = {2008-05-10},
 funding = {NSERC PGS-D},
 location = {Leipzig, Germany},
 pagerange = {99--108},
 pages = {99--108},
 rate = {8/42 or 19%},
 role = { Co-author},
 title = {What do large commits tell us?: A taxonomical study of large commits},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindleMSR2008-large-changes.pdf},
 venue = {Proc. of the 2008 Working Conference on Mining Software Repositories (MSR-08)},
 year = {2008}
}

Release Pattern Discovery via Partitioning: Methodology and Case Study

Abram Hindle, Michael W. Godfrey, Richard C. Holt
Proc. of 2007 Intl. Workshop on Mining Software Repositories (MSR-07), Minneapolis, United States
2007 1--9
Acceptance:15/39 or 38%
PDF

@inproceedings{abram2007MSR07rpdvpmacs,
 author = {Abram Hindle and Michael W. Godfrey and Richard C. Holt},
 authors = {Abram Hindle, Michael W. Godfrey, Richard C. Holt},
 booktitle = {Proc. of 2007 Intl. Workshop on Mining Software Repositories (MSR-07)},
 date = {May 19--20, 2007},
 funding = {NSERC PGS-D},
 location = {Minneapolis, United States},
 pagerange = {1--9},
 pages = {1--9},
 rate = {15/39 or 38%},
 region = {Minnesota},
 role = { Co-author},
 title = {Release Pattern Discovery via Partitioning: Methodology and Case Study},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2007MSR-Release-Pattern-Discovery.pdf},
 venue = {Proc. of 2007 Intl. Workshop on Mining Software Repositories (MSR-07)},
 year = {2007}
}

Release Pattern Discovery: A Case Study of Database Systems

Abram Hindle, Michael W. Godfrey, Richard C. Holt
Proc. of the 2007 Intl. Conference on Software Maintenance (ICSM-07), Paris, France
2007 285--294
Acceptance:41/214 or 21%
PDF

@inproceedings{abram2007ICSM07rpdacsods,
 author = {Abram Hindle and Michael W. Godfrey and Richard C. Holt},
 authors = {Abram Hindle, Michael W. Godfrey, Richard C. Holt},
 booktitle = {Proc. of the 2007 Intl. Conference on Software Maintenance (ICSM-07)},
 date = {2007-10-2},
 funding = {NSERC PGS-D},
 location = {Paris, France},
 pagerange = {285--294},
 pages = {285--294},
 rate = {41/214 or 21%},
 role = { Co-author},
 title = {Release Pattern Discovery: A Case Study of Database Systems},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2007ICSM-release-pattern.pdf},
 venue = {Proc. of the 2007 Intl. Conference on Software Maintenance (ICSM-07)},
 year = {2007}
}

YARN: Animating Software Evolution

Abram Hindle, ZhenMing Jiang, Walid Koleilat, Michael W. Godfrey, and Richard C. Holt
Proc. of 2007 IEEE International Workshop on Visualizing Software for Understanding and Analysis (VISSOFT-07), Banff, Alberta
2007 129--136
Acceptance:15/34 or 44%
PDF
DOI:10.1109/VISSOF.2007.4290711

@inproceedings{abram2007VISSOFT07yase,
 author = {Abram Hindle and ZhenMing Jiang and Walid Koleilat and Michael W. Godfrey and and Richard C. Holt},
 authors = {Abram Hindle, ZhenMing Jiang, Walid Koleilat, Michael W. Godfrey, and Richard C. Holt},
 booktitle = {Proc. of 2007 IEEE International Workshop on Visualizing Software for Understanding and Analysis (VISSOFT-07)},
 date = {2007-06-25},
 doi = {10.1109/VISSOF.2007.4290711},
 funding = {NSERC PGS-D},
 location = {Banff, Alberta},
 pagerange = {129--136},
 pages = {129--136},
 rate = {15/34 or 44%},
 role = { Co-author},
 title = {YARN: Animating Software Evolution},
 type = {inproceedings},
 url = {http://softwareprocess.ca/pubs/hindle2007VISSOFT-YARN.pdf},
 venue = {Proc. of 2007 IEEE International Workshop on Visualizing Software for Understanding and Analysis (VISSOFT-07)},
 year = {2007}
}

Publications and Collaborations of Abram Hindle