Bad tests are WORSE than no tests

Valentina Jemuović

Feb 24

Automated Testing is NOT better than Manual Testing. Many companies do Automated Testing in an ineffective way - with low ROI, making it worse than staying with Manual QA.

45 Comments

Feb 25Edited

What I find most difficult to calculate the ROI of tests is to estimate the value of the test. My intuition is that it equals the cost of bugs that did not happen thanks to the test. How do you estimate this?

Expand full comment

Valentina Jemuović

I find the following value of tests:

1. Reducing wasted salary cost due to delayed bug fixing - by having effective Automated Testing (compared to Manual Testing) we can save on the time that would have been otherwise wasted by Developers & Manual QA. Based on my calculations, for a small company (25 developers), the value is it helps reduce wasting 600,000 EUR, see my calculations here https://docs.google.com/spreadsheets/d/1f2jHxQAHPS-pycmH4qdrmUQRsGCajyztLvtfS6b6uws/edit?gid=0#gid=0

2. Reducing revenue loss due to bugs - by having effective Automated Testing, we can greatly reduce the number of bugs shipped to customers (Zero Defect Software). This means we help retain our existing customers... however, with Manual Testing, it's not susitainable, more bugs to production, customers leave for competitors, so we lose revenue. To estimate this one, I'd use figures regarding product price, and how many customers we lost, preferably tracking reason why customers left.

Point (1) is easier to estimate with public figures.

Point (2) requires private company data.

I'm interested to hear any thoughts you have on estimations, and whether you looked at any calculations online or if you tried to make your calculator?

P.S. The calculator I shared in (1) reflects some basic research I did online & rough estimates, but I'm open to hearing how you would approach it too.

Expand full comment

Daniel Steinberg

Feb 24Edited

Wow, Valentina. Thank you for a beautiful post packed from start to finish with truth. I can definitely relate to the opening analogy to fitness. When my father of blessed memory retired, he decided to start exercising, so he started doing some exercises that he remembered from the Army. Within days he had torn both rotator cuffs.

As Arlo Belshee says, "a test is a spec." Until a team absorbs this principle, automated tests will have little value.

Expand full comment

Valentina Jemuović

Thanks Daniel!

"As Arlo Belshee says, "a test is a spec." Until a team absorbs this principle, automated tests will have little value." - > Exactly! What's your approach in getting teams towards this?

Expand full comment

Daniel Steinberg

I focus on characterization tests of legacy code with some katas sprinkled in. I find the Gilded Rose Kata particularly effective, as the requirements easily map to tests.

Expand full comment

Valentina Jemuović

What kind of characterization tests do you write - do you jump straight to characterization tests at the unit level, or acceptance tests first?

Expand full comment

Daniel Steinberg

Great question, and one fitted for the generic consultant's response of "it depends."

In general, characterization testing, as introduced by Feathers, is a prelude to refactoring. Therefore, I characterize the unit of code that is being refactored. Whether these are called "unit tests" or "integration tests" is a matter of semantics.

If, however, the priority is to build up a regression suite, then characterizion of the API level is ideal. These tests present a great opportunity for introducing Given-When-Then:

- GIVEN system state + request arguments

- WHEN request

- THEN response + adjusted system state

Note that setting up an API-level test suite may take awhile, as infrastructural dependencies need to be stubbed and persistence layers need to be localized.

Expand full comment

Valentina Jemuović

Thanks, that's how I interpret characterization tests - they can exist at any level. So we can write characterization acceptance tests, chracterization component tests, characterization unit tests...

Ok, I see you start at high level too.

Given an application which has a frontend and several microservices, at what level would your characterization test be? Would it span the whole system (frontend + backend) or would you do separate characterization test for frontend, and for each microservice (at their API levels)?

Expand full comment

Good article and certainly something I will share with my teams. I would add some observations I made in the last ten years or so: although most developers are ready to admit that bad tests are bad and even go as far as defining the behavior vs structure coupling as the culprit, they have a lot of difficulty to identify them and even more to use patterns smartly that helps such decoupling. Instead, I often see high usage of mock objects (please use ports/drivers architecture!), global states (do you run your tests randomly or always in the same order?) or just looking at the error code (assert(my function()==true)). They start using TDD, but write the code and then modify the test to fit the code. Or start writing Gerkin's statements like "when variable is true then result must be false" (real example here, and that wasn't for a negation function!).

Briefly, there seems to be a cliff, even among senior developers, between the technique and the understanding of the techniques.

Resources like yours help a lot alleviate that problem, you provide always good explanation about why and how to use each techniques, but I'm very afraid that with the popularity of ai-assisted coding, that cliff will become larger.

Overall, thanks a lot for your work in popularizing and explaining good practices like TDD, we just need more good communicator like you.

Expand full comment

Valentina Jemuović

Thanks Fabio, really appreciate your sharing of this article!

"they have a lot of difficulty to identify them and even more to use patterns smartly that helps such decoupling" -> yes, that's the problem. Most developers don't even know what's structural coupling in tests, hence don't know there is a problem, don't know how to solve them.

"Instead, I often see high usage of mock objects (please use ports/drivers architecture!), global states (do you run your tests randomly or always in the same order?) or just looking at the error code (assert(my function()==true))."

-> That corresponds to what I see during my technical audits too!

What's your approach with teams?

Expand full comment

Good question! Especially given that there is always something to do, we can't just stop the work (anyway, training should be continue, not staggered). I'm also a CTO/founder, so my interventions were always dependant on what was my role at that point, and whom is available to second me. But let focus on a senior engineering-related role for simplicity.

So, first principle: always start were the team are and use what the team needs now as motivator. That just another reason why each intervention is unique to the situation.

From there, and specifically for teaching coupling and how to address them, I would lookup mostly for two things:

- large commits (or PR if they're not doing TBD). If the scope is reasonable, coupling is almost always the culprit for large PR. So I will sit down with the team to see how we could have reduced the size of the change and discussed different approaches they can use the next time.

- feature increment. A feature increment is a good opportunity to reduce the coupling. I will do it by splitting the feature in two different parts: one part about the increment itself, and the other about refactoring the code to make the first part as small as possible. The refactoring part needs to be done first and most of the teaching is done during the refinement. It opens the eyes about how important planning can be to reduce future work, but also how refactoring is a continuous activity rather than something you do as an after thought, if there is any time left (there is never any time left).

Expand full comment

Valentina Jemuović

I like how you pointed out the need to switch to incrementalism, and that you separate the increment into two parts - behavioral increment & the refactoring increment, I use that strategy too.

I'm interested at what point do you write tests, and which test types?

Expand full comment

Overall, I let the team decides on that; I've seen too many bad tests like you described to force people to write those. Especially in the game industry where a lot of code are either thin integration layers or content real-time fuzzy logic (ie the right behavior is better described as 'it feels right' than 'it is doing X').

We still have a couple of tests and most of the time, the team tried to use TDD, but what I found is most of the time, they wrote tests with the code in mind, or write a tests that is not able to run, or modify the tests to run after they wrote the code. Resulting in unnecessary coupling.

Second observation: too large automation. For e.g., tests fixtures that take 5-30 minutes to put in place, and with no option to keep around after the tests. So, every tests run take a lot of time, encouraging developers to adopt others way to test their code while developing.

Final one, and this one I'm still trying to understand more clearly: the prod environment is considered as secondary instead of primary. By this, I meant that if a bug happens in prod, they barely look at it and try instead to reproduce in a dev environment. That includes running the code on PC rather than on mobile. The result is code that are poorly instrumented and so hard to have proper observability on it.

Fortunately, no team have all those problems and most teams have only a few of them, but how to approach it really depends on the situations: maturity of the product, of the team, and which phase of development you are among them.

Expand full comment

Valentina Jemuović

"I let the team decides on that; I've seen too many bad tests like you described to force people to write those"

What kind of approach do you use:

1. Do you leave it up to developers to write tests however they want (i.e. it's up to each team to learn, they can write tests in whatever way, i.e. they aren't forced to write effective tests, they have freedom)

2. Or you provide them with internal guidance - e.g. you or someone internally does some reviews and provides them with guidance... e.g. one team that's good at a skillset is going to review another team's code...

3. Or you hired someone externally to do test code review & provide the team with guidance

Expand full comment

Oh, really depends on the situation. As CTO and founder, I was able to take various roles across the company, depending on where each projects and each team's needs. And each projects had different teams.

I had a team where I had them to drop their tests: the tests were ineffective and eating about 80% of their time for mostly false positive.

I forced an other team to write some automation tests since their QAs (they were two of them) were barely able to do a regression for a release in 1 week (with some work on both automation and test plan, we were able to do a full regression in less than a day... Which was still too much, but more understandable given the covering).

I also once override one of my manager who has forbidden his team to write any tests, but then had to also requested his developers to accelerate his pipeline, which was taking over 1 hour to run and using way too many resources. The manager's concerns were right, just his intervention was wrong.

Other initiatives include requesting specific SLA metrics for image compression quality (forcing the integration of proper tools and ibstrumentation, including synthetic testings), providing formations, asking for process documentation, supporting internal training, sharing articles, etc.

Note: none of the above was for similar products: a SaaS like service, full stack SDK suite with multiple platforms integration, a SaaS like service, a client integration project... All of them were requiring different approaches, and more importantly, all of them were done by different people, with their own strengths and weaknesses. Also, as CTO, I was trying to not be too involve and delegate most of my directions to my seniors, according to their abilities and strengths. Only recently I had the opportunity to do more hands-on direct interventions.

Expand full comment

Continue thread →

Been repeating this for years. It's far more important to know refactoring techniques for legacy systems that do not require tests. This way whatever test you do start writing has a better chance of being GOOD. Kevlin Heney calls these SUTs (systems under test) = GUT (good unit test).

Expand full comment

Horacio Dalla Valle

hey denis, thanks for sharing your experience. how do you know you're not changing behavior if automated tests aren't written before the refactor? do you just trust the tools?

Expand full comment

Valentina Jemuović

This is a good question. In my view, if we do not have tests, then we can't know, i.e. refactoring is then unsafe. That's why we need to write tests that are one level higher than the refactoring.

Expand full comment

Valentina Jemuović

"Been repeating this for years." - great to hear this!

What's your approach to legacy system transformation, where do you start, and what's your process?

Expand full comment

Extracting functional cores in the business-heavy parts of the system. Best advice I can give is use a good IDE and learn its refactoring tools. All the IntelliJ ones are great.

99% of the problem is lack of visibility in where the code has eroded. So you want to make it scream out as loudly as possible.

My approach was different 10 years ago. Nowadays the automated refactoring tools along with analysis like Codescene or Sonar are strong enough to immediately get started.

Expand full comment

Valentina Jemuović

Prior to the extraction, do you write Acceptance Tests, or you jump straight into the refactoring?

Great that you use Codescene! In what way has it helped you with legacy projects?

Expand full comment

It depends—usually a mix of both. Acceptance tests end up being written one way another—the question is whether it is better to tackle with immediate refactoring.

Keep in mind that in a legacy codebase that has quality issues, the compiler alone can act as a very good starting point for acceptance tests. JVM, C# and javascript based languages the compilers are strong enough to provide an early harness for conservative refactoring.

Refactoring which give potential to make the acceptance tests of higher quality (or potential shunt in unit tested code).

I shunt and train shunting a lot. Just because the tests cannot be kept without invasive change, doesn't mean the code cannot be tested first using that approach—though this requires a mobbing team to maintain history of it in the short term that it takes the team to transition to a safer harness (Like ATDD).

Normally I advice against creating acceptance tests that create a false sense of security. If I had to rush testing in order to refactor more aggressively I'd start with characterisation tests on the use case.

Expand full comment

Valentina Jemuović

1. What kind of refactoring would you do without Acceptance Tests, versus what kind of refactoring would you do given Acceptance Tests?

2. "JVM, C# and javascript based languages the compilers are strong enough to provide an early harness for conservative refactoring." - I assume here you're referring to simple refactorings, like rename variable/method?

3. "Refactoring which give potential to make the acceptance tests of higher quality (or potential shunt in unit tested code)." - In what way does code refactoring improve acceptance tests?

4. "I shunt and train shunting a lot. Just because the tests cannot be kept without invasive change, doesn't mean the code cannot be tested first using that approach" - What's your approach there?

5. When we speak of Acceptance Tests, how do you define an Acceptance Test?

6. "Normally I advice against creating acceptance tests that create a false sense of security. If I had to rush testing in order to refactor more aggressively I'd start with characterisation tests on the use case." -> When you say characterization tests on the use case, at what level - is it acceptance test level, component level, unit test level, something else?

Expand full comment

To condense 2-3 into a single answer: Work up from the team's current skill level to isolate and train the skill they lack while making progress with what skills they have.

For example: they may be good at manual testing and debugging while implementing features, but lack foresight on architectural issues, testable design, functional design. This is the most combination I see in the field.

4: I refer to a technique which combines Alistair Cockburn's Self-Shunting Test with Bill Wake's Isolate-Improve-Inline. I saw Beck mention this operation under the name 'The Saff Squeeze' but I've yet to see someone use that term organically outside of discussions like these.

1+4+6: I focus my coaching on stream-aligned teams so we tend to categorise goals like "We need X tests here" as a process waste unless we can connect it with the most critical business objective in the form of "We will introduce X tests here in order to build feature Y faster", to bring the focus to modernization and better habbits during ongoing feature development rather than a rewrite or extraction excerise. Let me know if the explanation is not making sense, we could probably discuss this more easily in a live format.

Expand full comment

Continue thread →

Exactly! Bad automated testing is worse than sticking with manual QA testing!

Expand full comment

Valentina Jemuović

Yes! Manual QA Testing is not the worst thing. After all, it's better than Poor Automated Testing!

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts