Get Claude Code to write tests you'll keep

By the end of this you'll have a setup and a prompt that make Claude Code write tests that fail for the right reasons, so you keep them instead of deleting them. The trick is order. Ask for the tests before the implementation, confirm they fail, commit them, then tell Claude to make them pass without touching a single assertion.

You need two things first. Claude Code installed in a project with a test runner already wired up, and about 10 minutes. A working CLAUDE.md helps, and we add one rule to it below.

Why Claude Code's default tests pass and prove nothing

Ask Claude Code to "add tests for this function" and it reads the function first. It then writes assertions that match what the code already returns. The tests go green on the first run, which feels like a win and is the opposite of one.

A test born from the implementation cannot catch a regression in that implementation. It asserts the current behavior, bug included. If the function returns undefined where it should throw, the generated test asserts undefined and stays green forever. You shipped a test that locks in the bug.

The fix is to write the test before the code exists. With no implementation to copy, Claude has to encode what the behavior should be from the spec you give it. That single change is what separates a test you keep from one you delete in a month.

The setup: a CLAUDE.md block that stops the gaming

Claude Code reads your project CLAUDE.md at the start of every session, so a rule there applies to every run. Add a short testing block. It does one job: it stops Claude from editing assertions when the implementation is wrong.

## Testing
 
- Test command: `pnpm vitest run`
- We practice TDD. Write the failing test first, then the implementation.
- Never modify a test to make it pass. If a test fails, the implementation is wrong.
- A test asserts one behavior. Do not assert on private internals or call order.

The third line is the one that earns its place. Without it, Claude will weaken an assertion or delete a case to reach green, because "all tests pass" is the goal you handed it. The line reframes a red test as a signal about the code, not the test.

Write the tests before the code

Tell Claude you are doing TDD before you ask for anything. The phrasing matters more than it looks.

We're doing TDD. Write tests for a parseDuration(input) function that turns
"1h30m" into 5400 seconds. Cover: hours only, minutes only, combined,
empty string, and a malformed input that should throw. Do not write the
implementation yet. The function does not exist.

Two parts of that prompt do real work. Naming the cases up front stops Claude from writing three happy-path tests and calling it done. Saying "the function does not exist" stops it from stubbing a fake parseDuration or writing a mock that satisfies the test early, which is the most common way the loop goes wrong.

If you skip the TDD framing, Claude often invents the implementation alongside the test to make the file run. Then you are back to tests that match invented code.

Confirm the tests fail, then commit them

Run the test command yourself, or have Claude run it. Every test should fail, and it should fail because parseDuration is not defined, not because of a typo in the test.

pnpm vitest run
# FAIL  parseDuration › parses hours only
# ReferenceError: parseDuration is not defined

A failure for the right reason is the checkpoint. Commit the failing tests now, before any implementation exists.

git add . && git commit -m "test: parseDuration cases (red)"

This commit is your clean revert. If the implementation pass drifts and starts rewriting tests, you reset to this commit and keep the assertions you trust. The red commit is cheap insurance, and it is the step people skip first.

The prompt that implements without touching the tests

Now point Claude at the implementation. The prompt has to fence off the tests explicitly, or the model treats them as fair game.

Implement parseDuration so all tests pass. Do not modify the tests.
Keep going until `pnpm vitest run` is fully green. If a test looks wrong,
stop and tell me. Do not change the assertion.

"Do not modify the tests" and "keep going until green" carry the weight. The first removes the easy exit of weakening an assertion. The second tells Claude to iterate against the failures instead of stopping at the first partial pass. The "stop and tell me" clause gives it an honest off-ramp when a test really is wrong, so it surfaces the conflict instead of quietly editing around it.

Watch the diff while it works. If Claude touches a test file during the implementation pass, that is the rule failing, and you caught it. Revert that file and re-run the prompt.

Push test-writing to a subagent so your main context stays clean

The failing-test phase fills your session with the spec, the cases, and the red output. By the time Claude writes the implementation, that context is heavy, and a heavy context makes the model likelier to conflate the two phases.

Delegate the test-writing to a subagent. The subagent writes the tests in its own context window and returns only the finished test file. Your main session never sees the back-and-forth, so the implementation pass starts on a clean slate with just the committed red tests as its starting point.

The separation also mirrors the discipline you want. One context decides what correct means. A different one makes it true.

The three checks that tell a kept test from a thrown-away one

Before you trust a generated test, run it through three questions. Each one takes seconds and saves you a false-green test in CI.

First, does it fail when you break the behavior? Change one line of the implementation on purpose. If every test still passes, the tests assert nothing real. Revert and ask why.

Second, does it name one behavior? A test called works correctly that checks five things will fail for reasons you cannot read from the name. throws on malformed input tells you exactly what broke.

Third, does it assert on internals the implementation is free to change? A test that checks parseDuration called a helper named splitUnits breaks the moment you refactor, even when the output is still correct. Assert on the return value, not the path the code took to get there.

A test that clears all three goes in CI and stays there. The rest get deleted while you still remember why.

You need two things first. Claude Code installed in a project with a test runner already wired up, and about 10 minutes. A working CLAUDE.md helps, and we add one rule to it below.

Why Claude Code's default tests pass and prove nothing

The setup: a CLAUDE.md block that stops the gaming

## Testing
 
- Test command: `pnpm vitest run`
- We practice TDD. Write the failing test first, then the implementation.
- Never modify a test to make it pass. If a test fails, the implementation is wrong.
- A test asserts one behavior. Do not assert on private internals or call order.

Write the tests before the code

Tell Claude you are doing TDD before you ask for anything. The phrasing matters more than it looks.

We're doing TDD. Write tests for a parseDuration(input) function that turns
"1h30m" into 5400 seconds. Cover: hours only, minutes only, combined,
empty string, and a malformed input that should throw. Do not write the
implementation yet. The function does not exist.

If you skip the TDD framing, Claude often invents the implementation alongside the test to make the file run. Then you are back to tests that match invented code.

Confirm the tests fail, then commit them

Run the test command yourself, or have Claude run it. Every test should fail, and it should fail because parseDuration is not defined, not because of a typo in the test.

pnpm vitest run
# FAIL  parseDuration › parses hours only
# ReferenceError: parseDuration is not defined

A failure for the right reason is the checkpoint. Commit the failing tests now, before any implementation exists.

git add . && git commit -m "test: parseDuration cases (red)"

The prompt that implements without touching the tests

Now point Claude at the implementation. The prompt has to fence off the tests explicitly, or the model treats them as fair game.

Implement parseDuration so all tests pass. Do not modify the tests.
Keep going until `pnpm vitest run` is fully green. If a test looks wrong,
stop and tell me. Do not change the assertion.

Watch the diff while it works. If Claude touches a test file during the implementation pass, that is the rule failing, and you caught it. Revert that file and re-run the prompt.

Push test-writing to a subagent so your main context stays clean

The separation also mirrors the discipline you want. One context decides what correct means. A different one makes it true.

The three checks that tell a kept test from a thrown-away one

Before you trust a generated test, run it through three questions. Each one takes seconds and saves you a false-green test in CI.

First, does it fail when you break the behavior? Change one line of the implementation on purpose. If every test still passes, the tests assert nothing real. Revert and ask why.

A test that clears all three goes in CI and stays there. The rest get deleted while you still remember why.

Get Claude Code to write tests you'll keep

Why Claude Code's default tests pass and prove nothing

The setup: a CLAUDE.md block that stops the gaming

Write the tests before the code

Confirm the tests fail, then commit them

The prompt that implements without touching the tests

Push test-writing to a subagent so your main context stays clean

The three checks that tell a kept test from a thrown-away one

How to write a CLAUDE.md that changes Claude Code's behavior

More like this

Codex CLI review: what beats Claude Code after 2 months

Stop AI agents committing to main: a 2-line git hook

Cut your Claude Code cost in half with one routing rule

Claude Code sub-agents: when to spawn one (3 cases)

Get Claude Code to write tests you'll keep

Why Claude Code's default tests pass and prove nothing

The setup: a CLAUDE.md block that stops the gaming

Write the tests before the code

Confirm the tests fail, then commit them

The prompt that implements without touching the tests

Push test-writing to a subagent so your main context stays clean

The three checks that tell a kept test from a thrown-away one

How to write a CLAUDE.md that changes Claude Code's behavior

More like this

Codex CLI review: what beats Claude Code after 2 months

Stop AI agents committing to main: a 2-line git hook

Cut your Claude Code cost in half with one routing rule

Claude Code sub-agents: when to spawn one (3 cases)

How to write a CLAUDE.md that changes Claude Code's behavior

Newsletter

More like this

Codex CLI review: what beats Claude Code after 2 months

Stop AI agents committing to main: a 2-line git hook

Cut your Claude Code cost in half with one routing rule

Claude Code sub-agents: when to spawn one (3 cases)

How to write a CLAUDE.md that changes Claude Code's behavior

Newsletter

More like this

Codex CLI review: what beats Claude Code after 2 months

Stop AI agents committing to main: a 2-line git hook

Cut your Claude Code cost in half with one routing rule

Claude Code sub-agents: when to spawn one (3 cases)