How to Select the Right Software Quality Metrics for Your Project

Written by Eric Smeby | Jan 1, 1970 12:00:00 AM

Why your project needs software quality metrics

Any team bumps into many bottlenecks and sources of defects throughout the software development lifecycle. Often, when one bottleneck is fixed, a new bottleneck springs up in another spot due to this fix. You will need different metrics to diagnose and improve various areas on each step of your company’s software journey and achieve the highest possible quality. But here is the tricky part.

Each stakeholder has a different view of “quality.” So, you will need not one but a set of software quality metrics for productive conversations focused on the interests of each stakeholder. For example:

CFO may define quality as the value per hour.
COO may define it as the automation of operational tasks.
Product owners generally look at feature delivery, perceived value in the marketplace, and whether the software built matches what was requested.
Engineers and stakeholders can seriously disagree when quality has a limited definition that only covers software behaving as designed.

To find a middle ground among these perspectives, the software development team should agree on a metrics mix that will fit the processes and architecture and help achieve the goals set. Let’s see what metrics you can choose from.

Types of software quality metrics

Below, I feature several proven software quality metrics. Find out their purpose and how to apply them in the context of software development.

DORA (DevOps Research and Assessment) Metrics

These metrics are popular for a reason. They are highly correlated with high-performing software delivery teams.

DF (Deployment Frequency). Deploying quickly and frequently typically means bugs can be resolved promptly in the next release. Since most deliveries are small, bugs are also minor.
LT (Lead Time For Change). This one measures how long it takes for the code to go from being written to being in production. This includes the CI/CD pipeline and processes like code reviews. It’s helpful to track how long a feature takes to go from ideation to production, grooming to production, and coding to production. Each segment shares essential information; if it follows too fast in relation to other segments, it could be because it’s being skipped, causing communication issues later.
CFR (Change Failure Rate). CFR helps to find out what percentage of releases cause an issue.
TRS (Time to Restore Service). It tells how long it takes to recover from an outage. It’s also good to measure the window of lost data and time to recover data.

Flow Framework

Flow framework helps measure how software flows from ideation to production.

Velocity. Measures how much is delivered in a given time period.
Efficiency. Helps find out how much is delivered per resource.
Time. (From start to finish.) Reveals how long it takes for the average feature to get to production.
Load. Shows how many features are in flight at a given period of time. Tip: want to improve quality — lower the load.
Value. Shows how much value is created per time period or per feature.
Cost. Helps find out how much each feature costs.
Quality. Reveals how many bug fixes per time period over how many features.
Happiness. Happy engineers are typically a sign of good developer experience and good quality.

High-Level Quality Metrics

Customer Churn. If software underperforms and is buggy, customers leave. If it is good and solves their problem, they generally stay. Measuring customer churn will help you catch a signal that something is wrong and fix the issue.
CAC. Cost to Acquire Customer. If it is higher than the industry average, software is most likely difficult to use. If it is lower than industry, it’s a sign of word of mouth.
Uptime. Helps track how often systems are down.
Frequency of Alerts. Alert fatigue will cause teams to ignore them. Measuring the frequency of alerts can help stabilize their frequency and address them more effectively.
Speed of Delivery. This may seem counterintuitive, but high quality leads to faster development over the long term. Poor quality slows things down. This is the opposite of a common misconception that one can either have speed or quality.
Fear of releasing on Friday. This is a subjective metric. If a team is afraid to release on Friday, quality isn’t where it should be. If they aren’t afraid of releasing, quality is good.
User Satisfaction Surveys. An outside perspective on your solution is crucial.

Traditional Quality Metrics

Code Coverage. This is a good metric to know the state of the tests. These metrics help measure the number of functions that have been called, statements and branches executed, and boolean sub-expressions and lines of source code tested.
Mutation Coverage. This will help identify changes to the codebase made randomly and see if code coverage would have caught the change in behavior. The metric checks the quality of code coverage as sometimes it can get extensive, but the 5% not covered is what really matters.
Bug Density measures how many issues there are per code component.
Latency describes how long software takes to run. This is good to test at different levels of scale to see what breaks at what levels.
Linting & Static Analysis. Software can scan code to make sure it’s formatted correctly, and code follows shop standards. Some more advanced analyses can share security vulnerabilities and code smells.
Security Vulnerabilities. These can manifest both in the code itself and in packages used by the code.
Cyclomatic Complexity. These metrics help check how many paths there are through a function. If the resulting number is over 7, it will be very difficult for a human to understand the code. As for the number of variables in a function, again, if over 7, it’s a sign of complexity.
Tech Debt List. Use this metric to measure how much tech debt is tracked.

How to create the right mix of metrics

Software is ultimately about getting 1s and 0s to make a machine solve a problem. This requires detailed activities by software engineers, including many things beyond writing code: peer reviews, infrastructure, test automation, security, business understanding, flexibility, etc. The best way to manage a group of people in getting tech to solve the problem in an efficient manner is through choosing the right metrics.

Understanding project goals

Understanding the project goals will help determine the right mix of metrics, as the ones chosen will drive behaviors. For instance, a modernization project will have different metrics from a greenfield project. A startup will have different metrics from an enterprise-level company or a regulated entity.

Key considerations for metric selection

Below, I explain the critical factors that should influence the choice of software quality metrics.

Most metrics are two-sided. They nudge things in one direction, but overdone can go too far.

For instance, Lines of Code (LOC) may cause engineers to write more code per hour. However, that metric is known in the industry to create lots of code that does very little in an inflexible and complex way. LOC is not a good metric.

Another example is velocity. It may be a good metric for a startup seeking speed of feature delivery. But, if unchecked, velocity metrics often result in the growth of tech debt that ultimately slows the team down after 6-12 months.

Overly strict defect metrics can dampen innovation and increase the cost of software delivery by creating too many gates.

Summing up, the best option overall is to have a mix of metrics and choose them to nudge behaviors in the right direction based on the current situation.

There are two main philosophies for quality. One focuses on preventing defects, and the associated metrics are about reducing the occurrence of bugs. The other philosophy is centered on increasing the ability to respond to defects and reducing their impact.

Both make good goals, but often they are at odds. Preventing defects takes a rigorous development process and usually has many steps to build quality into the process. Responding to defects is more about the rigor of the production environment — one that has flexibility, observability, resilience, and self-healing.

Quality is often the result of simplicity. Metrics that track simplicity and decoupling often result in quality.

Simple is hard to accomplish, but not impossible. The step-by-step guide on the visual below will help your team achieve simple and clear metrics.

Implementation strategies for software quality metrics

First of all, the number of metrics matters. Pick three. Using only one metric is likely to have unintended consequences. On the contrary, doing over three leads to losing focus and impact. After the three start to work, you can drill down to add additional metrics or extend.

Next, keep a dashboard and review it weekly or bi-weekly. If behavior is not seen, it won’t change.

Almost all engineering metrics are easy to game. It’s generally a bad idea to link metrics to pay or bonus. Better to link behaviors.

In some cases, look at metrics without sharing to get a baseline.

Use metrics review at the start of retrospectives. This way, the team can brainstorm on how to improve the metrics and provide team ownership of results.

Be careful not to believe the hype. Often, metrics are chosen to tell a story that may not be real. Keep metrics that manage a team internal and reality-based. Use judgment if metrics don’t match what is being seen by customers.

Lastly, compare against the team’s past performance to continually improve. If teams start to act overconfident, compare against industry benchmarks.

Case: Setting metrics with limited fixed values

How to choose and adjust metrics when there’s limited to no planning at all? Say, there are no deadlines, just sprints as the only more or less fixed value and the scope.

Vitaliy Avramchuk, QA Team Lead at INSART, shares his strategy for cases like this.

“When partnered with an offshore software development provider, the company mostly measures two things: the quality of the work delivered and its cost. You don’t want to micromanage everything. It’s an advantage when the team can set the right metrics or take the existing ones and perform up to the highest standard.”

Below are some ideas from Vitaliy’s experience on what the team can measure to improve software quality in the limited development planning scenario.

Monitor the number of bugs in the production stage (reduced from up to 10 to 2-3).
Monitor the source of the bugs (omissions on the QA side, poor requirements, regression) - helps assess a particular development phase and team (QAs and developers).
Measure the number of bugs per sprint. It’s worth mentioning here that this metric can show dramatic changes depending on the temporary and permanent shifts in the team (vacations, sick leaves, people quitting, etc.). While the scope of the work can be hard to measure, this metric works quite well when measuring the periods of stability in the team specifically.
Monitor the time of spotting bugs. For instance, if you see that many bugs were found during the regression testing phase, you can implement measures to catch them earlier on. As a result, you may get fewer bugs at the production stage and speed up regression testing while reducing its cost.
Monitor time spent on regression testing. There’s this tricky moment when you may want the team to have a fixed time per testing. But it's next to impossible when the project lasts years and functionality grows. The trick is to analyze the areas to be changed before creating tickets.

“Metrics do not exist for their own sake. Metrics exist for assessing processes. You should have a clear understanding of the process, potential problems in it, and what you want to measure. This is where you start to come to the right metrics eventually.”

In the visual below, see the example of the results your team can achieve by following this strategy.

Case: Using metrics to speed up the software development process

This case is something from my experience. For the company I was working for, the main pain point at that moment was the slow development process. So, I started figuring out the reason for the issue.

I began by measuring the size of the releases and how long it took from something coming into the sprint pipeline to it coming out. I found out that in each sprint, the team put in a lot of effort, but it took almost forever for things to be delivered. Interestingly, it took over 50 business days for some things to come from grooming to production.

With that in mind, I started building up a solution to the challenge, and it was not only about metrics (as metrics don't come in a vacuum, like Vitaliy said.)

I started measuring and highlighting that whatever we took in for the current sprint, we should finish in that sprint. I made this highlight the ultimate goal of the team.
The team had to quit the practice of prioritizing the never-ending requests of the leadership team over the long-term goals. Direct requests to engineers that distract them and force them to reprioritize tasks effectively delay development for ages. In addition, focusing on long-term goals reduces friction among stakeholders, with one constantly trying to put their needs before the rest.
The subjective metric of the “Friday release fear” worked well, too, in that case. A fear persisted in the team to release on the last week of the month, signaling poor quality. And the signal was there for a reason, as I found out after measuring the number of broken releases and bug tickets in the sprint vs. everything else.

So, we managed to deconstruct the big blurry picture by measuring the percentage of sprint interrupted by low-priority requests, the number of bug tickets addressed against those that were not, and the number of releases broken. By using these values, we could adjust and restructure the scope and work more effectively towards the main goal.

One thing to keep in mind is that the effect of such work is not immediate. This analysis and the change of the strategy led to a temporary slowdown, which was, however, necessary for the eventual speedup. The measures I mentioned above helped us to reduce the number of broken releases from 60% to 20% with a speedup reducing the time for stories to go from grooming to production from 50 days to 10 days.

Also, the metrics to choose would differ depending on the size of your company. At their very beginning, startups can have no metrics at all or just get by with those the founders have experience with. However, familiarity does not always mean good. To find out which metrics can boost software development at your company, get in touch with INSART’s experts for a complimentary consultation.

View full post