How to go about benchmarking code quality, and why it's important
There are many tools out there that can help you get insights into the quality of your code. When these tools measure a certain code quality metric, they usually display what we call ‘atomic’ data. Let’s take cyclomatic complexity as an example. Cyclomatic complexity is expressed in the number of different paths that can be taken through a function. If we have a file with only one function in it, that has a total of 5 paths through it, most tools will state: “You have an average cyclomatic complexity of 5 in this file”.
Atomic data is useful, but it leaves a lot of room for interpretation: two engineers in the same organization might have very different thoughts about a specific metric. One might consider an average complexity of 5 to be good, while another might consider it to be bad. And that is just one code quality metric, a company should measure a whole set of metrics to get an accurate perception of their code quality. If every engineer decided for themselves what they considered good and bad complexity, duplication, coverage, etc., it is going to be very hard to implement a company-wide code quality strategy.
How can we solve this? By benchmarking code quality against other industrial projects. Instead of endless discussions on whether an average complexity of 5 is good or not, we can now shift the discussion to whether having an average complexity that is 83% higher than the industry average is desirable. We now have a relative score, which tells us a lot more than an absolute score. Besides that, relative scores are easier to interpret for managers as well: they quickly lose oversight when presented with a ton of code quality metrics. It is a lot easier to understand that a project ranks in the top 10% compared to others in terms of code quality.
This begs the question: why do most tools not allow benchmarking code quality of any kind? The answer is simple: they can’t. Usually, these tools run in a protected environment inside an organization, and data that could be used to benchmark never leaves the company’s environment. Our TiCS Framework is an exception in this case, as our users share (anonymous) benchmarking data with each other (you can find out more on our framework, here)
Benchmarking code quality benefits engineers by giving more guidance in the process of determining whether the quality of your code is good or bad. However, there are also benefits for engineering managers: especially if they do not have a background in extensive software engineering themselves, some metrics like code coverage or cyclomatic complexity might be foreign concepts to them. While these metrics can be understood with some research, for someone that does not work with them on a regular basis, it is even harder to determine whether a certain value of coverage or complexity is good or bad. By using benchmarking code quality, we give engineering managers the opportunity to steer a code base toward better code quality, something that would have been more difficult if they would have to rely solely on atomic data.
There is, however, one big critique to give on benchmarking: if the rest of the industry also has bad code quality, then what use is a benchmark? Your code quality might be in the top 10% of your industry, but if the average code quality in your industry is terrible, this might give you a false sense of security. This is why it is important to not only benchmark code quality, but also to qualify your code using a qualification scheme. A qualification scheme sets independent standards on what is considered good code quality and what is considered bad code quality. We will write a blog about this in the future as well, but in the meantime, you can take a look at the qualification scheme we have built using 20+ years of industrial data: the TIOBE Quality Indicator.