Every legitimate software company I know of uses metrics to evaluate the performance of their engineering and development teams. And I would even go a step further and say that most engineering leaders like metrics. We like metrics because they are a lot like data — which is something most of us are obsessed with. We want to make data-driven decisions, and it’s natural to extrapolate that tendency when we evaluate our software engineering teams.
But there’s a complication: Engineers are people, and arbitrarily applied data algorithms don’t tell the whole story when it comes to performance metrics and the overall health of an engineering team.
If a company is going to survive and thrive amidst the Great Resignation, the leadership must realize that output is only one data point of interest. Great companies are only great because of the talented people who work there. If a company’s employees are highly productive but quickly burn out in a miserable environment, the company won’t be successful for long.
Engineering leaders are justified in their affinity for metrics, but we must be judicious in choosing which metrics we focus on if we are to truly understand how our people are doing instead of just tracking how much they are doing. We have only found one reliable metric for measuring the health of our engineering teams: predictability.
But before we discuss predictability, let’s start with an introduction to the most common metrics used to evaluate engineering teams.
Velocity
Velocity is the measure of how much work (measured in story points) a delivery team completes per iteration, and it is one of the most common metrics used for software development. The primary benefit of tracking velocity is that it’s easy to measure. Most Agile and project management tools out there track velocity.
On the downside, it can be a volatile indicator to track. Fluctuations in staffing, leave time for employees, personal emergencies, and other external events can cause velocity to vary substantially from iteration to iteration. While it might be true that fewer story points were completed for a specific iteration, the reasons why that happened can be obscured and potentially misinterpreted.
Another strike against sole reliance on velocity as a performance metric is that it can easily be gamed. For example, if a team wanted to intentionally distort its velocity to appear more productive than they were, it can manipulate the story points to reflect the output they want to report.
We track velocity at Stoplight, but we use a rolling average over several sprints as part of our process for evaluating team health. We expect the rolling average slope to be flat or increasing. When we see several sprints with decreasing slopes, we take a closer look.
Most importantly, we recognize velocity for what it is not. Velocity is not the supreme metric, nor is it even the best single metric out there.
Cycle Time
Cycle time is the amount of time it takes for a project to get to production once work has begun. It has some good, topical utility and can help you find high-level bottlenecks in your company’s workflows. For example, it could lead you to ask why a given activity such as code reviews is taking so long compared to other activities, or how long code reviews typically take at your company.
But, like velocity, cycle time has some major limitations related to complexity. It does not account for the reality that the more complicated a task is, the longer it will take to accomplish it. The cycle time metric assumes all tasks are equally complicated, which can distort reporting.
Also, outliers can drive up the measures for all activities and undermine the credibility of the reporting. If one or two unique items among dozens of other activities took longer than expected, does that necessarily mean there’s a systemic or widespread problem?
Cycle time is still useful for monitoring and improving processes, but it should by no means be one of your company’s most important metrics.
Lead Time
Lead time is basically cycle time plus the portion of time beginning when the issue was formally described and accounted for. This is a popular metric and is the leading metric for some software companies, but we don’t track it at Stoplight because it doesn’t account for the other (likely more important) factors that were higher priority at the time.
For an item to be prioritized, it must be given attention over other competing priorities. Lead time doesn’t account for those priorities and can give the impression that a team was doing nothing — when they were actually doing more important work.
When companies rely too heavily on lead time and ignore the context and competing priorities that affected it, they risk penalizing good teams making good decisions who simply focused on the most important work in their queue.
It’s also worth mentioning that apparent lead time problems often point to bigger issues at higher levels within the organization, such as project qualification and resource planning. If a team appears to be lagging on lead time, who is piling too many low-priority projects on them?
Formal Code Metrics
Code metrics include data such as lines of code, code coverage (how much of the code is covered by tests), code smells, duplication, and cyclomatic complexity. These metrics can tell you something about the health of your codebase, but they don’t actually indicate anything about the health of your engineering teams.
And to be honest, healthy organizations tend to have at least a little messiness in their codebase, because they’re more focused on delivering value to their customers than maintaining a pristine codebase. So again, this is a one-dimensional metric with limited utility.
Pull Requests Closed
Atlassian defines a pull request as a “mechanism for a developer to notify team members that they have completed a feature.” On the surface, this seems like another good metric to track because closed pull requests should indicate that value is being delivered to customers, right? Think again. This is another metric that is easily gamed.
Focusing too much on pull requests incentivizes teams to complete teeny-tiny pull requests that make it appear they are delivering lots of new changes and features. But tracking the quantity alone doesn’t reflect how much actual value is being delivered to customers.
While none of these metrics are necessarily wrong, when they’re taken separately they don’t tell the whole story of your engineering team’s health and productivity.
So what should we be tracking?
Predictability.
Predictability is the most important metric we use at Stoplight to evaluate the health and productivity of our engineering teams, and I’m here to make the case for your company to adopt it, too.
Predictability
We define predictability as the percentage of what a delivery team thought they could complete that they actually completed per iteration ((velocity/estimate) *100). Predictability is hands-down the most important metric for software engineering teams and the broader organization.
A healthy team is 100% (± 15%) predictable — they consistently deliver 85% to 115% of what they estimate. When an engineering or development team is predictable, the organization can rely on its metrics and data to realistically predict delivery dates, which has a systemic effect on the entire organization.
For the developers, this results in high degrees of accuracy in accomplishing achievable goals without creating an unsustainable burnout culture. Outside of the development team, the ability to reliably forecast delivery dates builds credibility and also prevents other teams and functional groups from being subjected to constant crises resulting from poor planning or forecasting.
When the engineering team consistently delivers on time, other critical partners such as marketing and sales can accurately plan their associated activities related to the product, which builds stability and trust across the organization while preventing burnout due to constant deadline stress.
Predictability is the most important metric I track for our engineering team at Stoplight because it gives me the most reliable information about our engineering teams while also helping us set and meet sound expectations for our customers.
Change the Dialogue
The software industry doesn’t have to be such a meat grinder. Engineering work is hard enough, and it becomes unbearable when clunky, arbitrary metrics are forced on teams. Many of these metrics don’t really tell you that much — even worse, they can stress people out over factors they don’t control. It’s an ugly cycle, and it leads to the high levels of turnover that have been associated with our industry for decades.
If you’re a tech executive or engineering leader, I encourage you to take a hard look at how you’re managing your organization and tracking its performance. Are you relying too much on any of the one-dimensional metrics mentioned above? If so, consider factoring predictability into your evaluation processes.
And if you’re an engineer or developer who’s tired of being evaluated by flawed metrics, stop by Stoplight’s careers page. We would love to hear from you.
I’m the VP of Engineering at Stoplight, and I’m committed to changing the dialogue and culture of software engineering to the degree that Stoplight is impacting the dialogue and culture around APIs.