Any team bumps into many bottlenecks and sources of defects throughout the software development lifecycle. Often, when one bottleneck is fixed, a new bottleneck springs up in another spot due to this fix. You will need different metrics to diagnose and improve various areas on each step of your company’s software journey and achieve the highest possible quality. But here is the tricky part.
Each stakeholder has a different view of “quality.” So, you will need not one but a set of software quality metrics for productive conversations focused on the interests of each stakeholder. For example:
Below, I feature several proven software quality metrics. Find out their purpose and how to apply them in the context of software development.
These metrics are popular for a reason. They are highly correlated with high-performing software delivery teams.
DF (Deployment Frequency). Deploying quickly and frequently typically means bugs can be resolved promptly in the next release. Since most deliveries are small, bugs are also minor.
LT (Lead Time For Change). This one measures how long it takes for the code to go from being written to being in production. This includes the CI/CD pipeline and processes like code reviews. It’s helpful to track how long a feature takes to go from ideation to production, grooming to production, and coding to production. Each segment shares essential information; if it follows too fast in relation to other segments, it could be because it’s being skipped, causing communication issues later.
CFR (Change Failure Rate). CFR helps to find out what percentage of releases cause an issue.
TRS (Time to Restore Service). It tells how long it takes to recover from an outage. It’s also good to measure the window of lost data and time to recover data.
Flow framework helps measure how software flows from ideation to production.
Velocity. Measures how much is delivered in a given time period.
Efficiency. Helps find out how much is delivered per resource.
Time. (From start to finish.) Reveals how long it takes for the average feature to get to production.
Load. Shows how many features are in flight at a given period of time. Tip: want to improve quality — lower the load.
Value. Shows how much value is created per time period or per feature.
Cost. Helps find out how much each feature costs.
Quality. Reveals how many bug fixes per time period over how many features.
Happiness. Happy engineers are typically a sign of good developer experience and good quality.
Customer Churn. If software underperforms and is buggy, customers leave. If it is good and solves their problem, they generally stay. Measuring customer churn will help you catch a signal that something is wrong and fix the issue.
CAC. Cost to Acquire Customer. If it is higher than the industry average, software is most likely difficult to use. If it is lower than industry, it’s a sign of word of mouth.
Uptime. Helps track how often systems are down.
Frequency of Alerts. Alert fatigue will cause teams to ignore them. Measuring the frequency of alerts can help stabilize their frequency and address them more effectively.
Speed of Delivery. This may seem counterintuitive, but high quality leads to faster development over the long term. Poor quality slows things down. This is the opposite of a common misconception that one can either have speed or quality.
Fear of releasing on Friday. This is a subjective metric. If a team is afraid to release on Friday, quality isn’t where it should be. If they aren’t afraid of releasing, quality is good.
User Satisfaction Surveys. An outside perspective on your solution is crucial.
Code Coverage. This is a good metric to know the state of the tests. These metrics help measure the number of functions that have been called, statements and branches executed, and boolean sub-expressions and lines of source code tested.
Mutation Coverage. This will help identify changes to the codebase made randomly and see if code coverage would have caught the change in behavior. The metric checks the quality of code coverage as sometimes it can get extensive, but the 5% not covered is what really matters.
Bug Density measures how many issues there are per code component.
Latency describes how long software takes to run. This is good to test at different levels of scale to see what breaks at what levels.
Linting & Static Analysis. Software can scan code to make sure it’s formatted correctly, and code follows shop standards. Some more advanced analyses can share security vulnerabilities and code smells.
Security Vulnerabilities. These can manifest both in the code itself and in packages used by the code.
Cyclomatic Complexity. These metrics help check how many paths there are through a function. If the resulting number is over 7, it will be very difficult for a human to understand the code. As for the number of variables in a function, again, if over 7, it’s a sign of complexity.
Tech Debt List. Use this metric to measure how much tech debt is tracked.
Software is ultimately about getting 1s and 0s to make a machine solve a problem. This requires detailed activities by software engineers, including many things beyond writing code: peer reviews, infrastructure, test automation, security, business understanding, flexibility, etc. The best way to manage a group of people in getting tech to solve the problem in an efficient manner is through choosing the right metrics.
Understanding the project goals will help determine the right mix of metrics, as the ones chosen will drive behaviors. For instance, a modernization project will have different metrics from a greenfield project. A startup will have different metrics from an enterprise-level company or a regulated entity.
Below, I explain the critical factors that should influence the choice of software quality metrics.
Most metrics are two-sided. They nudge things in one direction, but overdone can go too far.
For instance, Lines of Code (LOC) may cause engineers to write more code per hour. However, that metric is known in the industry to create lots of code that does very little in an inflexible and complex way. LOC is not a good metric.
Another example is velocity. It may be a good metric for a startup seeking speed of feature delivery. But, if unchecked, velocity metrics often result in the growth of tech debt that ultimately slows the team down after 6-12 months.
Overly strict defect metrics can dampen innovation and increase the cost of software delivery by creating too many gates.
Summing up, the best option overall is to have a mix of metrics and choose them to nudge behaviors in the right direction based on the current situation.
There are two main philosophies for quality. One focuses on preventing defects, and the associated metrics are about reducing the occurrence of bugs. The other philosophy is centered on increasing the ability to respond to defects and reducing their impact.
Both make good goals, but often they are at odds. Preventing defects takes a rigorous development process and usually has many steps to build quality into the process. Responding to defects is more about the rigor of the production environment — one that has flexibility, observability, resilience, and self-healing.
Quality is often the result of simplicity. Metrics that track simplicity and decoupling often result in quality.
Simple is hard to accomplish, but not impossible. The step-by-step guide on the visual below will help your team achieve simple and clear metrics.
First of all, the number of metrics matters. Pick three. Using only one metric is likely to have unintended consequences. On the contrary, doing over three leads to losing focus and impact. After the three start to work, you can drill down to add additional metrics or extend.
Next, keep a dashboard and review it weekly or bi-weekly. If behavior is not seen, it won’t change.
Almost all engineering metrics are easy to game. It’s generally a bad idea to link metrics to pay or bonus. Better to link behaviors.
In some cases, look at metrics without sharing to get a baseline.
Use metrics review at the start of retrospectives. This way, the team can brainstorm on how to improve the metrics and provide team ownership of results.
Be careful not to believe the hype. Often, metrics are chosen to tell a story that may not be real. Keep metrics that manage a team internal and reality-based. Use judgment if metrics don’t match what is being seen by customers.
Lastly, compare against the team’s past performance to continually improve. If teams start to act overconfident, compare against industry benchmarks.
How to choose and adjust metrics when there’s limited to no planning at all? Say, there are no deadlines, just sprints as the only more or less fixed value and the scope.
Vitaliy Avramchuk, QA Team Lead at INSART, shares his strategy for cases like this.
“When partnered with an offshore software development provider, the company mostly measures two things: the quality of the work delivered and its cost. You don’t want to micromanage everything. It’s an advantage when the team can set the right metrics or take the existing ones and perform up to the highest standard.”
Below are some ideas from Vitaliy’s experience on what the team can measure to improve software quality in the limited development planning scenario.
“Metrics do not exist for their own sake. Metrics exist for assessing processes. You should have a clear understanding of the process, potential problems in it, and what you want to measure. This is where you start to come to the right metrics eventually.”
In the visual below, see the example of the results your team can achieve by following this strategy.
This case is something from my experience. For the company I was working for, the main pain point at that moment was the slow development process. So, I started figuring out the reason for the issue.
I began by measuring the size of the releases and how long it took from something coming into the sprint pipeline to it coming out. I found out that in each sprint, the team put in a lot of effort, but it took almost forever for things to be delivered. Interestingly, it took over 50 business days for some things to come from grooming to production.
With that in mind, I started building up a solution to the challenge, and it was not only about metrics (as metrics don't come in a vacuum, like Vitaliy said.)
So, we managed to deconstruct the big blurry picture by measuring the percentage of sprint interrupted by low-priority requests, the number of bug tickets addressed against those that were not, and the number of releases broken. By using these values, we could adjust and restructure the scope and work more effectively towards the main goal.
One thing to keep in mind is that the effect of such work is not immediate. This analysis and the change of the strategy led to a temporary slowdown, which was, however, necessary for the eventual speedup. The measures I mentioned above helped us to reduce the number of broken releases from 60% to 20% with a speedup reducing the time for stories to go from grooming to production from 50 days to 10 days.
Also, the metrics to choose would differ depending on the size of your company. At their very beginning, startups can have no metrics at all or just get by with those the founders have experience with. However, familiarity does not always mean good. To find out which metrics can boost software development at your company, get in touch with INSART’s experts for a complimentary consultation.