Critique #5 Unit Testing Class Design?
You might often hear the phrase “TDD helps me with design“. For many people, this means improving the design of their classes, driving their UML class design through TDD. But there's a BIG problem!
You might often hear the phrase “TDD helps me with design“.
Design of what? At what level?
Is TDD helping you design your class interfaces? Are you using TDD to help you design every class, its methods, and its relationships to other classes?
Or is TDD helping you design your module interface? A module exposes some behavior; a module may be implemented as one or more classes.
If you read most tutorials and courses about TDD, you will likely come across response 1: TDD helps us design better classes in OOP.
At first, it’s logical; through the test, you’re designing the class's interface, and your test is your first consumer of the class.
But some questions arise:
Can you understand the expected external behavior by reading those tests? No, you can’t; you can only see the class behavior. You’d need to write additional “higher level“ tests to see the expected externally visible behavior.
What happens when the UML class diagram evolves? When each time you change the UML class diagram (without changing externally visible behavior), your tests are breaking, does it feel wasteful?
What happens when, after spending so much time fixing broken tests due to changes in the UML class diagram, you leave the UML class diagram as-is, you stop improving it, when your UML design becomes “frozen“… When it starts to stagnate, to rot…
Maintenance cost of tests
The maintenance cost of a software system encompasses not only the effort to maintain the source code but also the effort to maintain the test code. How do we optimize the maintenance of tests:
What is the cost of writing a test? (initial cost - once off)
What is the cost of reading a test? (maintenance cost - recurring)
What is the cost of changing a test? (maintenance cost - recurring)
The cost [1] is the most negligible cost, it’s a one-off cost. However, the costs of types [2] and [3] are recurring; they are associated with the maintenance of the tests and are the highest contributors to overall maintenance costs.
Now, for the tests to provide clear economic value, it is necessary that the maintenance cost of the tests should be significantly lower than the cost of not having tests at all. This means, the maintenance cost of the test suite should be significantly lower than the cost of performing manual regression testing, which would have been the case if tests didn’t exist.
So if we want to optimize the BIGGEST costs, the cost of maintenance:
The cost of reading tests - we’d want to optimize for readability of the test, so that the reader of a test has to spend the least amount of time reading the test, the least amount of time trying to understand the intent of the test, what are the expectations that the test is asserting? More specifically, by reading the test, is the expected behavior clear?
The cost of changing tests - we’d want to optimize the stability of tests, and we’d want to minimize the amount of change that tests undergo unless warranted. It’s understandable that tests changes in response to changes in specifications - because tests are executable specifications, so it’s natural to expect that a change in expected behavior causes us to change the test. However, if there’s no change in externally expected behavior - if a code change is just refactoring or structural change - there should be zero impact on the test.
Fragile tests are the biggest enemy of test suite maintenance. The fragile test problem is manifested through tests that break due to refactoring, i.e. tests which break in response to change in structure without a change in behavior.
One Behavior - Multiple Structural Options
The required behavior of the system dictates the API we’re exposing. There isn’t too much creativity involved here. Our API is supposed to be very closely derived from functional requirements of behavior. What are the inputs? What are the outputs?
But the part where there’s more “creativity” is in the structural solutions to be implemented. Whether you’re doing up front design or whether you’re doing incremental design, you will see the emergence of different structural solutions.
We could implement the behavior in one class
We could implement the behavior by decomposing it across classes
Which of those UML class diagrams is the “best”? Well, that’s a source of debate among developers.
Let’s say we’re implementing a module that is exposing specific behavior.
Will we couple our tests to the module API?
Or will we couple our tests to the individual classes?
Unit Testing can be Wasteful and Harmful!
Previously, we analyzed the possible downside of misapplying Unit Testing - when unit tests become expensive and harmful. We answered the question: Is Unit Testing Harmful? The answer is: yes, when Unit Tests are coupled to structure (rather than behavior), then indeed, we end up with the following problems:
Yes, Unit Tests can be wasteful with high maintenance costs - low ROI!
Yes, Unit Tests can hinder refactoring - contradictory to the TDD cycle!
Yes, Unit Tests can harm architecture - worsening our designs!
So, now that we’ve covered the theory last time, let’s see what this means in practice.
I’ll draw upon Kent Beck's example in the canonical book TDD by Example where he shows how to write good tests, coupled to behavior. I’ll show the reverse - how people write structural tests (instead of behavioral tests) and how it leads to DISASTER. Specifically, I will compare the difference between testing behavioral outcomes versus coupling tests to UML class diagrams. Why am I writing the “wrong“ way? Because that’s currently the most common way, unit testing is shown in tutorials, in courses, and practiced in companies (and that’s why teams give up on TDD and Unit Testing).
This will be yet another long article 4,000+ words; we will be showing the HARMFUL effects of doing unit testing the wrong way (which is typical for many teams) using actual CODE samples.