Visible Metrics with Corey Haines

We've been discussing Metrics in the Software Craftsmanship Group and I just posted some notes on the topic yesterday. Today, I had an opportunity to watch Corey's "Road Thoughts" on Visible Metrics.

Road Thoughts - Visible Metrics from Corey Haines on Vimeo.

And I like where Corey is headed with this. The measurements discussed are related to code quality. They apply to a code base written by one or by a team of hundreds. These metrics can and should be used for good. Developers can monitor their improvement or ensure their quality does not slide.

Corey takes this concept to a new level; comparative metrics. Metrics that are gathered about your code base, but are devoid of any specific intellectual property. These metrics are then shared, perhaps publicly or perhaps anonymously, for collective comparison. So not only can you then monitor your own quality, but you can compare against others. How do you mark against the industry as a whole? How do you mark against companies of similar size in your regional location? How do you compare against individuals around the world in your language of choice?

With courage, what can we as a community do to help each other learn and grow?

Metrics in Software Development

Today I posted an entry on the Google Software Craftsmanship Board. In the posting, I asked if anyone had experience with individual velocity metrics on an Agile project. I posted a similar (but more brief) inquiry on Twitter as well.

The responses were varied and the discussion was good. But I was a bit surprised by the overall quality of the responses.

First of all, I had to rephrase the question more than once before anyone answered it. I received many vague and prescriptive responses stating it should not be done. But most folks never indicated if they had personal experience with it. "That's not the agile way" seemed to be a satisfactory argument. No additional support required.

Only one respondent directly stated they had experience with using individual velocity metrics. They also described a disastrous outcome; a complete breakdown of trust, quality, and throughput. All of which, were attributed to the collection of a single data point. Seems to me a project that crashes and burns that hard has issues beyond collection of a metric. Perhaps the collection (or use) of the metric was an indicator of a greater problem, but I doubt it was THE problem. In other words, the team was likely to suffer the same fate whether or not personal velocity data was gathered. More likely, the motivation behind the collection and use of the data was the real issue.

As the discussion progressed, there seemed to be a dangerous general consensus; metrics that apply to individuals are bad and metrics that apply to groups are good. While the discussion started with velocity, it took a winding path through all forms of data points; code coverage, cyclomatic complexity, estimation accuracy, etc. The danger here is the generalization of good and bad based on individual or group respectively.

Good points were also made. Many spoke of the need to understand the purpose of the metric. As one individual put it:

(1) you need to be aware of what question you are asking and what situation you are trying to change;
(2) you need to understand what the relationship between a given metric and a given property is;
(3) you need to [be] clear under what circumstances the metric will cease to be relevant.

Anthony Broad-Crawford said it very well:

As a manager myself the primary goal of metrics is to simply create a "check engine light". Meaning, when the light comes on, it doesn't necessarily mean anything is wrong. What it means, is you should "pop open the hood" and start asking some questions to figure out if something really is in fact wrong. For example, if you did track individual velocity metrics, and someone did report a very low number for the iteration, a check engine light should come on. It doesn't mean the developer did anything wrong or sub-par. It could be for any number of good things (pair programming, design sessions, training, etc). However, if a goose egg is thrown, and they didn't participate in anything that would account for the goose egg (extensive pairing, design sessions, training, etc) something may be wrong and you as the manager should find out.

I do agree if management simply looks at the spreadsheet and makes assumptions (good or bad) based on some threshold it will be problematic. Metrics, must like everything else, are not black and white. A good manager (IMO) will realize that metrics are shades of gray and act accordingly.

The discussion was primarily about the need for individual velocity metrics. And the general consensus (on Google and on twitter) was that individual metrics are bad while group metrics are good.

But metrics, whether individual or collective, can be used for good or evil. They can result in improved or hindered performance. They can tell a valuable story or they can completely mislead.
To categorically dismiss all metrics related to individuals is almost as dangerous as categorically accepting all metrics that apply to groups.
We need to recognize that metrics without a clear and expressed purpose are noise at best and are quite likely dangerous. And the metrics are merely indicators; they don't tell the complete story. I like the "check engine light" analogy. Metrics let you know that you need to look into it further.