One of the more difficult challenges in coding is to capture intent within your code. A code, naturally, is great at "what", but it takes some real effort to have also some of the "why" in it. There are a lot of tricks that can help - good variable names, breaking stuff into methods, modules and packages, domain driven design, and when all else fails - we might add a comment. Still, it's well too common to read a piece of code written by someone else (or by you of the past) and wonder "why on earth is the code like this?" When writing tests, and especially system tests, it's even more important.
I got a reminder for that recently. When testing some specific feature of malware detection, some of the files started to fail, and after some investigation we've found out that most of them were blocked because they contained a file that was malicious in a different way than the one expected by the test, and another file is now no longer considered malicious after some global configuration has changed.
The easiest way to fix this is to just remove those offending files and forget about it, but then come the questions -
- Why are those files there?
- Are there different aspects to the different files that are relevant to the tested feature?
- why are we using multiple files to test what looks like a simple feature? are there complexities we're unaware of?
- If I want to replace the test files - what properties should they have?
- How to prevent this from happening in the future for the rest of the files?
Naturally, the exact time in which this has all happened was in the most inconvenient time - there was a pressing deadline blocked by this test, the person who wrote the test was in vacation, and everyone were wondering what is going on.
Looking back on the situation, I can see some mistakes we've made when writing this test, some of them we've talked about during the review and mistakenly dismissed, others we missed altogether:
- The test was actually testing more than one thing. Due to careless choice of test data, we chose files that participated in some flows different than the one we intended to test, when the configuration around those flows changed, our test was sending us false failure signals
- We failed to control our environment - we have a limitation (which we are aware of, and for the time being - accept) about some global configuration that can be updated outside of our team's control. We ignored the impact it might have on our test.
- We didn't do a deep enough analysis: we had some files that each exposed a different kind of bug during development, but instead of understanding the root cause and what was actually different in those files, we just lumped everything together.
- We were not intentional in our testing - instead of understanding the feature and crafting input data to challenge the different parts of our model, we just took some "real" data and threw it on our system. In addition to now not knowing which files would be a suitable replacement, we also have no idea how complete or incomplete our testing is.
- Our files are not labeled in a way that conveys intent - they are just called "file 1", "file 2" and so on (the actual name is also mentioning the feature's name, but that's about all the extra data there)
- Finally, our assertion messages proved to be less helpful than they should have - in some cases, not even mentioning the name of the file used (for time reasons, we decided to run multiple files on the same test, which we normally avoid)
So, we have some cleaning up to do now, but it's a good reminder to put more care about showing our intention in code.