If you write tests close to the code, your tests start shaping it. They expose tangled dependencies. They signal when a function is doing too much. They make you think about the interface before you commit to internals. This is the design feedback TDD takes credit for. You get most of it from any test written close to the code — not just from tests written first.

There's a broader version of the argument: writing tests first enforces design upfront. It doesn't. Design happens in your head before you write a line of anything. Tests are one way to commit a design, not a prerequisite for having one.

There are many ways to test software — manual, end-to-end, contract, integration, property-based, load, exploratory. TDD evangelism lives in exactly one of them: unit tests. The rest of the surface area is somebody else's problem.

TDD looks great on toy examples. Hey, let's TDD a stack. Push, pop, peek — three methods, no I/O, no dependencies, no surrounding system. That's not reality. Reality is a flaky third-party API, a payment gateway that times out, a schema that changed last week...

What actually happens

Someone announces they "do TDD." Pause. Repeat the announcement to anyone who walks past their desk. Then describe — at length — the colors of their feedback loop. Red. Green. Refactor. Like the alphabet, in case you forgot.

Bob

Not that long ago, I had an interview with a TDD evangelist. Let's call him Bob — to spare him the exposure. Bob wears a TDD hat the way some people wear tin foil — proud, tight, slightly off. Bob told me he writes tests first because — and I'm quoting — "I know what part of the system was tested." Two things went through my head:

  • How does someone like this actually get employed?
  • I'm talking to a badly trained model.

Frankly, I still don't fully understand what Bob meant by it. I suppose it was his winning argument. I disengaged with "I've never had your problems, so I don't know what you're going on about, mate." Not the most diplomatic interview move. An interview is a two-way process though — I'd rather wait for the right opportunity.

The only honest record of what's been tested is the code. The opinion collapses on first contact with a team. Code is shipped by groups of people. If "what was tested" only lives in the one engineer's head, the reviewer, the on-call, the new hire six months from now — they all get nothing. That's not engineering discipline. It's personal note-keeping wearing a methodology hat.

The screenshot trophy

Then there's the social-media variant: a screenshot of green checkmarks running down a test list, posted to social media or a Slack channel for everyone to see. Some people share photos of their food. But posting a passing test as a milestone is the tell of someone new — trying to impress everyone with the bare minimum.

Testing is standard engineering practice. It's part of the job, not a milestone.

Tests are not magic

Every test is more code. And every line of code is a maintenance liability. The more code in the repository — written, copy-pasted, AI-generated — the higher the probability of bugs. Tests don't escape that math. They're code too.

Tests reduce bugs. They aren't a magic safety net. A test existing doesn't mean the code under it is bug-free — it just means it was checked. And every test you write is another surface to maintain, another file that has to be updated when the underlying behaviour changes, another thing that can be wrong. A bad test will lie about what the code does. A flaky test will train the team to ignore failures. A test that exists only to bump coverage is dead weight in the water.

A test suite is only as good as the engineer who wrote it. If the engineer doesn't understand what should be true about the system, the tests will codify the misunderstanding — and now the misunderstanding has a green checkmark next to it.

Not all code earns a test

Plenty of code doesn't need a test:

  • Language features
  • Standard library functions
  • The framework's own well-tested behaviour

Writing tests for 1 + 1 because the linter says branch coverage dropped is busywork with a green checkmark on it.

Behavioural testing is what matters — what the function does for the system, not what each line evaluates to. The arithmetic is the language's job. The edge cases aren't — floating-point rounding, integer overflow, the boundaries where the runtime starts doing unexpected things. Those earn their tests.

A unit isn't a function — it's a unit of behaviour that can be one function or a whole module. The size doesn't matter. The boundary does.

What gets tested is judgment — the thing in production with stakes, the business rule that has to hold, the boundary condition the language won't catch. Coverage requirements treat every test as equal. That's the bug.

Monkey tests

The moment coverage becomes a CI gate, behaviour changes. The team stops asking "does this work?" and starts asking "does this branch get visited?" The result: code that walks the runtime through every line, asserts nothing meaningful and ticks the coverage percentage up.

Coverage doesn't tell you what was tested. It tells you what was hit. A 95% coverage number means the runtime walked over 95% of the lines. It says nothing about whether any of those lines did the right thing under any of the inputs they'll actually see.

A strict coverage threshold trains the team to write monkey tests. The number goes up, the bug count stays the same and the suite gets heavier every sprint with code that asserts nothing.

Where tests pay off

People like to say a well-tested codebase gives you fearless refactoring. True — to a degree. Rename a function, restructure a module, swap an internal implementation — and the suite tells you what broke. It's a double-edged sword though — passing tests don't mean nothing broke, only that the tests still pass. Test counts, coverage numbers, methodology stickers on a laptop — none of that ships better software.

The size of the test suite is influenced by the language choice. Statically-typed languages catch the dumb mistakes at compile time — wrong type, missing field, function signature mismatch. The type checker is built in, not bolted on. Loosely-typed languages don't have that backstop, so they bolt one on — TypeScript on top of JavaScript, mypy on top of Python Different language, different overhead. What gets tested is what the tooling won't catch.

The ceremony doesn't matter. Balancing a pogo stick on a beach ball doesn't matter. What matters is the end result: software that's reliable, performant and secure.

Exorcisms like this are the engineering version of insisting tea must be stirred clockwise — that stirring it anti-clockwise spoils it. The tea doesn't care. The ritual is for the person performing it — they might as well sacrifice a chicken on the last day of the month while they're at it.