Abram Hindle's Blog

Software is Hard

Software Science and Empirical Software Engineering

An introduction via my MSR 2018 Most Impactful Paper Talk

I recently gave a talk at MSR 2018 about a paper I wrote 10 years ago that was not immediately actionable to practitioners but potentially actionable to researchers.

Here’s a practice video of the talk, but I think I was tired and you can audibly hear yawning.

https://www.youtube.com/watch?v=XMEIJTPf_zo

My concerns were that we as a community:

  • were rejecting papers that could be used to build a body of knowledge because they weren’t immediately actionable;
  • were writing reviews saying “isn’t this obvious” when it was purely anecdotal before;
  • were providing lip service to replication yet not accepting of actual replication papers;
  • were grappling with the idea that engineering has to solve a problem;
  • were facing randomness in the review process because of “so what”, “not useful”, “irrelevant to practitioners” reviews that might not be accurate;
  • were not sure how to deal with papers that would add to our body of knowledge yet would not be applied immediately.

In the talk I was concerned that as a community we are too quick to reject work that is not immediately actionable to the stakeholders: developers, managers, software companies. We use our value judgment and evaluate the motivation of the authors and if we don’t agree we argue the paper should be rejected because it isn’t useful or actionable or poorly motivated. The point is that you can have a technically sound paper and get it easily rejected because people do not like argument. It’s not that you’re wrong or technically incorrect or inaccurate (which you could be) but it’s that in terms of a solving a problem right now you might not have done that. You might not have convinced them of the utility of the work. My focus in the talk was on rejecting work that wasn’t immediately actionable to software engineers and related stakeholders. So I brought up a term that Jim Cordy brought up with me in discussions we had: “software science”.

Software Science

In the talk I referenced “software science” and I tried to make a distinction between it and Empirical Software Engineering. In my view engineering is solving a problem with economic constraints. I think engineering is usually distinguished from science in the sense of application and practicality, where as science is far more general. My view of science is that it is meant to systematically study something. Some people make a distinction between the natural world and not. Some people argue science is the process or the knowledge produced by it. In general I think what we’re supposed to get from science is more knowledge and that knowledge might not be immediately usable.

Software science already exists in many different ways. Much of empirical software engineering is not actionable, pure measurement and statistical tests that tries to reflect what is being shown in the data we mine. Halstead wrote a book “Software Science” where he argued for measurement and came up with measures. Halstead’s metrics' validity and usefulness is argued to this day, but people in 2018 are still using them—I’ve implemented and used Halstead metrics before as well. There are many departments and divisions of Software Science at various universities that study software and publish empirical software engineering works. I’m not trying to claim the term or over shadow the other uses of “software science” I’m just trying to ask for an understanding of research that has value, but is not immediately actionable.

Software science in this context is a broad term meant to encompass research that seeks to apply scientific principles to the study of software and software development, not necessarily with the constraints of an engineering discipline. Is software science a superset or subset of empirical software engineering? No, not to me at least. Many works belong in both categories. But there are works that fit software science more than empirical software engineering. For instance Israel Herraiz’s work on statistically characterizingthe distributions of version control data seen in repositories. It might be useful to a tool builder but its direct application is not apparent—yet that body of knowledge of functions that can generate these distributions is important for theory building and understanding.

Regardless, until we mature as a discipline and find a way to aggregate knowledge we will be harming ourselves because we are actively preventing the build up of knowledge in our quest for novelty and immediate actionability. These are important, and we all recognize them as important. Yet as a community we cannot mature until we:

  • solidify knowledge with empirical studies;
  • accept “this is obvious” it not a valid reply from reviewers;
  • accept that collecting and building knowledge is important for science;

– paraphrased points from Daniel German

Thus I argue we should be able to signal that our work is not necessarily engineering but more on the scientific side. That doesn’t mean we give a free pass to anyone who does something. No if you claim your work is “software science” then you’d better back that claim up.

What should be in a software science paper?

So if one is to submit a software science paper where is the bar? What is required?

In my opinion one big problem in our field is overclaim. I’m guilty of it. You’re probably guilty of it. Overclaim in a paper can kill follow up papers because the onus on replication—the difficulty of it—and the lack of acceptance of straight replication is low. Furthermore your paper during review will always be in the shadow of the original who has laid claim to the space. We need to allow others into the same space—multiple times. We’ll never get meta-analysis like medicine has without it.

Another sin is poor future work. How many times have you read a future work section: we plan to apply this to more projects and different programming languages. Is it not obvious enough that your work had limitations, please can we have better future work sections where the authors (myself included) use their imaginations and think about where this will go. If you cannot envision the future of use of your labour the reviewers are going to have a hard time too. Because future work is literally science fiction (I stole that joke) you can always make something up. It just has to be plausible.

Furthermore in the interests of science and not claiming too much space you should make the assumptions and the uninvestigated clear to the reader. It might not be apparent through your claims.

Finally do you want to have an impact? Do you want people to cite you? Well let them follow up on your work by using your data and analysis scripts. At least share the analysis scripts—there’s a whole raft of good excuses to not share data unfortunately.

You should tell the reader:

  • Where this work could be used. What do you envision this work could be aggregated with?
  • What is the scope and context of your work? Don’t overclaim or over generalize. There will be need for other niches to be studied. If you investigate TDD projects written in Java on Github, then you shouldn’t generalize to all TDD projects in all languages in all repositories. If you investigate the performance of an Android software development pattern, should you claim all performance for all mobile platforms?
  • What isn’t covered and what assumptions you made.
  • Where you data is, and preferable where your analysis scripts are.

Thus you should have:

  • Future work sections that are more detailed than “we want to test on more projects”, how about the hard stuff you did not get to or papers you envision that could include this work?
  • The scope of the work clearly documented in the title, abstract, intro, and conclusions.
  • A description of what you did not explore and what assumptions you had to make.
  • Hyperlinks to your data and analysis script (please!)

How should I review a software science paper?

Review it like you did before. But make sure:

  • authors aren’t overclaiming in title, abstract, introduction, or conclusions;
  • Make sure the paper is properly scoped;
  • Make sure the paper is replicable;
  • Check if the future work is good enough;
  • Check if the authors have come up with a rationale why this might be used in the future;
  • If you’re going to reject based on motivation, please consider how the authors have argued for the future use of this work—could a measurement this paper be used as a parameter or advice for another developer?
  • Avoid saying “isn’t it obvious” unless you have prior work to back it up—anecdotes are not enough;
  • If you are reviewing a paper in a context you’re not familiar with, try to brush on that area. We’re doing a grave disservice to the younger researchers who are often more on the cutting edge of development and practical SE trends that you might be unfamiliar with.
  • Check if data or analysis scripts are available.

What should relevant conferences and journals do?

Conferences and journals that wish to be open to this kind of work need to help authors and reviews understand where they stand and what the bar is.

  • Allow lightweight shepherding so that overclaim can be dealt with in camera ready in a safeway.
  • Provide lip service to acting replications and provide clear guidelines of what is expected of authors and reviewers.
  • Provide lip service to software science and provide clear guidelines of what is expected from authors and reviewers.
  • Allow shorter papers and emphasize to reviewers you cannot do everything under the sun in a shorter paper.
  • Provide signals of where software science and empirical software engineering works should go.
  • Provide examples of “the bar” and guidelines for meeting the bar. I like checklists.

What I’m not saying

I’m not saying accept everything, my goal is more consistent reviewing and less work getting rejected because of lack of experience or field knowledge of the reviewers. My goal is also to improve on the empirical software engineering dilemma whereby much work is expected and the actionability is out of reach, yet it could be used later by other authors to build upon.

I’m not saying we have to let boring things into conferences. Although we need to have a discussion where we send dry results, negative results, or replications.

In conclusion

In other sciences it took the slow build and acquisition of knowledge before many leaps in understanding and practicality could be made. Software should be no different, we’re too young to be discriminating so harshly on an area we so poorly understand. How many times have you heard people argue for the abolishment of the term software engineering because “we can’t even approximate engineering”. This is a signal that our field is in its infancy and we should use this chance to build up our body of knowledge.

If we focus so heavily on the immediately actionable we will hobble our future.

To push the idea of “software science” or science in software engineering we need to have buy in from many stakeholders:

  • Authors need to provide clear signal in their papers about limits of their work, the scope of their work, where their work can used in the future.
  • Reviewers need to understand that we can’t do everything in 1 paper; that the obvious is not obvious yet until we can cite it as obvious; that there is a value in building up knowledge from results that might not be immediately actionable.
  • Journals and Conferences, as the gatekeepers of SE knowledge, need to make it clear how to build articles that can be accepted. Make it clear to reviewers and authors what you seek and if it is software science what you expect from it.

Thus I argued that we need to consider that we have different kinds of research and that we might have to more clearly identify works that fall under the software science umbrella of not immediately actionable results such that we can evaluate them in a fair manner. If we don’t, we risk slowing down our progress too much with a bias towards novelty and immediate application rather than truly improving the field with from a knowledge oriented perspective.

If you do wish to engage with me on this

Please do it via email and/or blog. abram.hindle@ this domain will work fine.

Most of what is written here is not mine. It’s a mix of Jim Cordy, Daniel German, Ric Holt, Michael Godfrey, Ahmed Hassan, Prem Devanbu, Wikipedia, the SE community, and western philosophy.