Mutation testing assesses the quality of your test suite. It works by making an edit to your code, then running the test suite and identifying what in your code base can break or change without tripping one of your unit tests. If something can be changed without a test failing, then a hole in your test suite is revealed and reported on.
You’ve just shipped your latest build and your coverage report has come back with the green seal of approval. 100% coverage ✅. You admire your reflection in the glossy sheen of your monitor. Who’s a good developer? Yes. You are.
Bad news though, by itself that number does nothing to instil confidence that any of those tests are effective. 😱
Let’s say that you want to write some functionality to add two numbers. You also want to assert that the functionality works, so you start by writing a test…
See if you can spot the mistake.
If you missed it, don’t worry, some of the most experienced engineers out there make this style of mistake time and time again.
Instead of asserting that the function returns the result, you've tested that (irrelevant of what the function is actually doing) 2 plus 3 is, and always will be, 5. You could also have written `expect(5).toBe(5)` or even `expect(true).toBe(true)`
Therefore, what you actually wind up testing is entirely disconnected from the function itself.
This is a terrible test, but the real snake in the grass here is that the function was called. The coverage reporter is going to have no idea if you are asserting on the outcome, so when it runs it will check off those lines in its report. The test suite will report back a pass. Meanwhile, the function could be missing its entire body and no one would be any the wiser.
We've established this is bad, but how can we detect and correct it? How can we test our tests?
One answer that isn’t manually sifting through a whole code base with a magnifying glass and a mission is Mutation testing.
The idea is to:
- ✅ Run the test suite and validate that everything is returning green.
- 🔧 Make a change to one thing inside the source code.
- 🔄 Run the test suite again.
- 🚦 Did at least one test fail? Yay! Our assertions are aware of this part of the codebase
- 👽 Did everything stay green? Ohhh! Might be something to investigate here…
- ♻️ Repeat, with a different change to the source code.
Mutations can be pretty much anything. From changing a hard-coded string to emptying out an entire function block. If the code can still build and pass all tests, we should get a pretty clear indicator that our tests are missing something (or that we potentially have some redundant code we can delete for free.)
Each mutation is tested alone, so you never have to worry about a compounding effect of two different changes.
For a fantastic example of a codebase with 100% coverage but with holes in their efficiency, check out Stryker’s RoboBar Example.
If you’re using C# or JavaScript, I’d even skip the example above and just go ahead with setting up Stryker. It’s a breeze to set up and you should be rolling with some good feedback. Depending on the size of your codebase, you might want to let this run over lunch.
Mutation testing does take time, but the results are incredibly insightful.
At WORTH we prefer to run mutation tests in nightly builds as they can take quite a while to finish!
If Stryker isn’t for you, have no fear! Mutation testing isn’t new and there are a ton of options out there for you to try. To get started, just give “Mutation testing in < your language > ” a search.
Let's connect and explore how we'd make your initiative more successful. What describes your situation best?