Bad tests are WORSE than no tests
Automated Testing is NOT better than Manual Testing. Many companies do Automated Testing in an ineffective way - with low ROI, making it worse than staying with Manual QA.
📢 My next Live Q&A Session is next Wednesday 19th March 17:00. Only Paid Members can join:
If you’re a Free Member, you can upgrade to attend the Live Q&A:
Companies believe that Manual QA & Debugging is the “rock bottom“. They believe that ANY automated testing is better compared to having NO automated testing.
This is false.
Bad automated testing is worse than manual testing. What!?
A story familiar to fitness trainers: An unfit & overweight person wants to get fit. They know that exercise is good for us.
So, they jump onto YouTube, watch some videos, buy courses, and practice the exercises at home - without any diagnostics, without seeing a doctor, and without consulting a fitness trainer.
They’re proud of their progress - they seem to be losing weight!
Then, after 1-2 months, they end up with a major back & knee injury.
But how could this have happened? Isn’t exercise good for us?
No, exercise can actually be dangerous when you’re out of shape. That’s what this person discovers when they later (finally) go to the fitness trainer. The fitness trainer, after doing a short diagnostics test, told them NOT to do the exercises they saw on YouTube (the time isn’t right), but instead to do other lower-intensity exercises, combined with the trainer’s continuous monitoring of progress. Whilst weight loss wasn’t drastic now, the good news was that the person was able to recover from their injury, because they were guided by the fitness trainer on how to do exercise in a SAFE way.
Unfortunately, many companies practice Bad Automated Testing.
But Bad Automated Testing is worse than Manual Testing.
Let’s find out why…
📢 My next Live Q&A Session is next Wednesday 5th March 17:00. Premium members will be able to join & watch the replay:
If you’re a free subscriber, you can upgrade to attend the Live Q&A:
How to measure the ROI of testing?
The Value of Testing: To what extent do our tests protect us from bugs (both regression bugs and new bugs)?
The Cost of Testing: What’s the cost to execute & maintain the tests?
The ROI of Testing = [(Value - Cost) / Cost] x 100
1. What’s the value of tests?
Good Manual Testing protects us against regression bugs.
Good Manual QA can be very effective in protecting us against regression bugs as well as discovering new bugs.
(Sure, we know that Manual QA is inefficient because it’s time-consuming, needs lots of resources… but let’s not confuse effectiveness & efficiency).
Story: There’s a team that was responsible for maintaining a large financial system, where quality was at stake. Even though they didn’t have automated testing, they had great Manual QA Engineers *and* had a very thorough UAT stage - the Manual QA Engineers would do very thorough testing, developers would do fixing, then this cycle would repeat. This meant that they had to hire a lot of Manual QA Engineers, their release cycle was very slow…. but the Manual Testing process was highly effective, because they had zero or almost zero bugs that reached the customer.
Bad Automated Testing does NOT protect us against regression bugs.
Vladimir Khorikov (Unit Testing Principles, Practices, and Patterns) shared the story of a team who wrote unit tests which did NOT protect us against regression bugs (by writing tests that had no adequate assertions):
“A group of developers had gone to a conference where many talks were devoted to unit testing. After returning, they decided to put their new knowledge into practice. Upper management supported them, and the great conversion to better programming techniques began. Internal presentations were given. New tools were installed. And, more importantly, a new company-wide rule was imposed: all development teams had to focus on writing tests exclusively until they reached the 100% code coverage mark.
As you might guess, this didn’t play out well… developers started to seek ways to game the system. Naturally, many of them came to the same realization: if you wrap all tests with
try/catch
blocks and don’t introduce any assertions in them, those tests are guaranteed to pass. People started to mindlessly create tests for the sake of meeting the mandatory 100% coverage requirement. Needless to say, those tests didn’t add any value to the projects. Moreover, they damaged the projects because of all the effort and time they steered away from productive activities, and because of the upkeep costs required to maintain the tests moving forward.”
Dave Farley (Modern Software Engineering) shared a similar story, where tests were written that didn’t test anything at all (hence don’t protect us against regression bugs, by not having adequate assertions):
“… At one of my clients, they decided that they could improve the quality of their code by increasing the level of test coverage. So, they began a project to institute the measurement, collected the data, and adopted a policy to encourage improved test coverage. They set a target of “80 percent test coverage“. Then they used that measurement to incentivize their development teams, bonuses were tied to hitting targets in test coverage.
… Some time later, they analyzed the tests that they had and found more than 25 percent of their tests had no assertions in them at all. So they had paid people on development teams, via bonuses, to write tests that tested nothing at all.
The stories shared by Vladimir & Dave match my own personal experience of auditing test suites.
It’s important to note that the problem above is not just applicable to Unit Tests, but at the higher levels too (Component Tests & Acceptance Tests). Developers can write tests that exercise the system but don’t test anything at all, due to poor assertions. Other problems included poor scenario analysis and poor boundary value analysis, all which makes the tests poor in protecting us against regression bugs.
2. What’s the cost of tests?
Manual Testing is costly - it’s very time-consuming to execute.
We all know that Manual Testing is expensive.
It’s expensive for QA Engineers to execute repetitive time-consuming manual tests. Many QA Engineers need to be hired.
It’s expensive for Developers to fix bugs that are detected late in the development cycle. The cost of rework is higher.
But Bad Automated Testing can be even costlier - expensive to maintain due to structural coupling.
Vladimir Khorikov (Unit Testing Principles, Practices, and Patterns) shared his story of tests that were low value AND expensive to maintain:
… those tests didn’t add any value to the projects. Moreover, they damaged the projects because of all the effort and time they steered away from productive activities, and because of the upkeep costs required to maintain the tests moving forward.”
Continuing on with Software Engineering at Google: Lessons Learned from Programming Over Time (Chapter 12 Unit Testing - “The Importance of Maintainability“), explaining the impact of high maintenance tests:
Imagine this scenario: Mary wants to add a simple new feature to the product and is able to implement it quickly, perhaps requiring only a couple dozen lines of code. But when she goes to check in her change, she gets a screen full of errors back from the automated testing system. She spends the rest of the day going through these failures one by one. In each case, the change introduced no actual bug, but broke some of the assumptions that the test made about the internal structure of the code, requiring those tests to be updated. Often, she has difficulty figuring out that the tests were trying to do in the first place, and the hacks she adds to fix them make those tests even more difficult to understand in the future. Ultimately, what should have been a quick job ends up taking hours or even days of busywork, killing Mary's productivity and sapping her morale.
Kent Beck tweeted:
Tests should be coupled to the behavior of code and decoupled from the structure of code. Seeing tests that fail on both counts.
He further clarified it with the Test Desiderata, regarding properties affecting ROI of tests:
Behavioral — tests should be sensitive to changes in the behavior of the code under test. If the behavior changes, the test result should change.
Structure-insensitive — tests should not change their result if the structure of the code changes.
So how does this impact maintenance costs?
Behavioral coupling is “good“. Tests should be coupled to behavior, because tests are executable specifications. So it’s natural that if the behavior changes, expected results change, so the tests have to be updated. This is an “essential“ maintenance cost with automated tests.
Structural coupling is “bad“. The problem comes with structural coupling. Good tests are structure-insensitive (not coupled to structure), whereas bad tests are structure-sensitive (coupled to structure). This was the same problem from the Google story above, where the developer had to spend a lot more effort fixing tests, compared to writing code, because the test was coupled to structure: “In each case, the change introduced no actual bug, but broke some of the assumptions that the test made about the internal structure of the code, requiring those tests to be updated.“
This problem exists across all levels. The examples above are related to Unit Testing, where the unit test was coupled to the structure of the code - which made the unit tests expensive to maintain. A similar problem can also exist at system level tests - for example, Dave Farley criticizes many E2E Tests (they are coupled to system structure - the UI), but this same issue can be equally visible in Acceptance Tests (where the Four Layer Model is not applied).
So let’s provide categories of structurally-coupled tests (bad):
Bad Unit Tests are coupled to structure of the code
Bad Component Tests are coupled to the component structure (coupled to component API)
Bad Acceptance Tests are coupled to the system structure (coupled to UI or API)
Bad Automated Tests have low ROI
To summarize why bad Automated Tests have low ROI:
Low Value = the tests don’t protect us against regression bugs (in the worse cast, offer us zero protection), but just give us a false sense of security.
High Cost = the tests are structurally-sensitive, thus whenever we make a structural change, it is higher effort to fix the tests rather than implement the code change
Impact: Poor Software Delivery:
Unsafe Delivery = due to poor bug protection, we release software with lots of bugs to the end of users
Slow Delivery = due to high maintenance cost, the total development time is increase, so releases are slower
What’s the impact of the low ROI:
Business executives lose trust in IT. The initiative is marked as failed.
There is loss of morale in the team. They are unwilling to try automated testing in the future (and would need to do a lot of unlearning).
What should we do?
We’ve invested in tests, developers wrote many tests, the developers believe that the tests are good… but most likely they’re not. The stories I read from Vladimir, Dave & Google… those stories match exactly what I’ve seen in my technical audits!
I have seen many senior developers who were senior in coding, but junior in testing skillset. They are mostly unaware that there is even a problem.
Due to the prevalence of the problem, even if your developers tell you everything is great, I recommend getting an audit done to discover if there’s a problem. My experience was: Engineering Managers were very proud of their team’s testing (citing high code coverage), the the team themselves is also proud… and then when I came in to do technical audit, I discovered that the tests didn’t even have regression bug protection, and had to point that out to the developers.
After recognizing the problem, the solution is:
Implement effective automated tests internally at a small scale (tests which protect us against regression bugs and are decoupled from structure). Consider an external review, before scaling it out.
Execute a testing transformation across the organization, but do it incrementally. As you’re doing it, I recommend external reviews at various checkpoints during the transformation, and measure DORA metrics as you go along.
📢 My next Live Q&A Session is next Wednesday 5th March 17:00. Premium members will be able to join & watch the replay:
Exactly! Bad automated testing is worse than sticking with manual QA testing!
What I find most difficult to calculate the ROI of tests is to estimate the value of the test. My intuition is that it equals the cost of bugs that did not happen thanks to the test. How do you estimate this?