SoftwareProcess.es About Projects

WhiteSpace

Ranking Changes by their Indentation is Analogous to Ranking Changes by their Complexity

Abstract: Maintainers often face the daunting task of wading through a collection of both new and old revisions, trying to ferret out those that warrant detailed inspection. Perhaps the most obvious way to rank revisions is by lines of code (LOC); this technique has the advantage of being both simple and fast. However, most revisions are quite small, and so we would like a way of distinguishing between simple and complex changes of equal size. Classical complexity metrics,such as Halstead’s and McCabe’s?, could be used but they are hard to apply to code fragments of different programming languages. We propose a language-independent approach to ranking revisions based on the indentation of their code fragments. We use the statistical moments of indentation as a lightweight and revision/diff friendly metric to proxy classical complexity metrics. We found that ranking revisions by the variance and summation of indentation was very similar to ranking revisions by traditional complexity measures since these measures correlate with both Halstead and McCabe? complexity; this was evaluated against the CVS histories of 278 active and popular SourceForge? projects. Thus, we conclude that measuring indentation alone can serve as a cheap and accurate proxy for computing the code complexity of revisions.

Papers:

Software:

Data:

 @article{hindle09sciprog,
 title = "Reading beside the lines: Using indentation to rank revisions by complexity",
 journal = "Science of Computer Programming",
 volume = "74",
 number = "7",
 pages = "414 - 429",
 year = "2009",
 note = "Special Issue on Program Comprehension (ICPC 2008)",
 issn = "0167-6423",
 doi = "DOI: 10.1016/j.scico.2009.02.005",
 url = "http://www.sciencedirect.com/science/article/B6V17-4VT14CM-1/2/e0e0ddda7661dc0b291216e2025cc9e4",
 author = "Abram Hindle and Michael W. Godfrey and Richard C. Holt",
 keywords = "Indentation",
 keywords = "Complexity",
 keywords = "McCabe?",
 keywords = "Halstead",
 keywords = "Metrics",
 abstract = "
 Maintainers often face the daunting task of wading through a collection of both new and old revisions, trying to ferret out those that warrant detailed inspection. Perhaps the most obvious way to rank revisions is by lines of code (LOC); this technique has the advantage of being both simple and fast. However, most revisions are quite small, and so we would like a way of distinguishing between simple and complex changes of equal size. Classical complexity metrics, such as Halstead's and McCabe?'s, could be used but they are hard to apply to code fragments of different programming languages. We propose a language-independent approach to ranking revisions based on the indentation of their code fragments. We use the statistical moments of indentation as a lightweight and revision/diff friendly metric to proxy classical complexity metrics. We found that ranking revisions by the variance and summation of indentation was very similar to ranking revisions by traditional complexity measures since these measures correlate with both Halstead and McCabe? complexity; this was evaluated against the CVS histories of 278 active and popular SourceForge? projects. Thus, we conclude that measuring indentation alone can serve as a cheap and accurate proxy for computing the code complexity of revisions."
 }