How much does prompt caching save in Claude Code?

A cache read costs 0.1x the base input price, a 90 percent discount on cached tokens. Claude Code caches your context automatically, so the saving applies without any setup.

Why did my Claude Code bill spike?

Usually because Opus is running work Sonnet would handle. Since 23 April 2026 the default model for API and Enterprise pay-as-you-go users is Opus 4.7, so check your model first and route execution to Sonnet.

Cut your Claude Code cost in half with one routing rule

By the end of this you'll have Claude Code running Opus for planning and Sonnet for everything else, which cuts your input and output cost by roughly 40 percent per token on the work that runs in execution mode. The setting is one alias. Claude Code already caches your context, so a cache read costs 10 percent of a fresh read. Routing is the lever you set yourself, and it is the biggest one.

You need two things first. Claude Code installed and billed per-token through the Anthropic API, not a flat Max plan. About 10 minutes to switch the model and read your usage.

Where the money goes

Most of a Claude Code bill is Opus running tasks Sonnet handles fine. Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. Sonnet 4.6 input is $3 per million, and output is cheaper still, so the same task on Sonnet costs you a fraction of the Opus rate.

The cost is not caching. Claude Code caches your context automatically, so the file tree, the system prompt, and the conversation so far get cached without you touching a setting. The lever you control is which model burns through those tokens. Model choice moves more money than anything else in the tool.

Set the routing rule with the opusplan alias

The fix is the opusplan alias. It uses Opus in plan mode for architecture and reasoning, then switches to Sonnet in execution mode for code generation. You plan with the smart model and execute with the cheap one.

Set it inside a session with the /model command.

claude
> /model opusplan

To make it the default for a project, pass the --model flag when you launch, or set the model in your config so every session starts there.

claude --model opusplan

The reason this works is that planning is where reasoning pays off and execution is where token volume piles up. Architecture decisions are a few hundred tokens of careful thought. Code generation is thousands of tokens of output, and that is the work you want on Sonnet at $3 per million input.

When to drop to Haiku instead

Some work does not need Sonnet either. Test generation, code review, and renaming passes are routine, and Haiku handles them for less. Haiku input is $1 per million against Opus at $5, so routing routine tasks to Haiku while reserving Opus for architecture cuts the per-token cost on that work by most of the gap.

Use Haiku for the mechanical jobs. Reach for Sonnet when the task involves real code logic, and keep Opus for the moment you are designing a system or debugging something tangled. The split is by difficulty, not by habit.

Check that prompt caching is already working

Caching is on by default, but it is worth confirming it lands. A cache read costs 0.1x the base input price, a 90 percent discount on every cached token. The difference is large enough to see in the bill.

Run the /cost command inside a session to see your usage breakdown.

> /cost

You want cache reads to be a high share of your input tokens. A long session that keeps re-reading the same files should show mostly cache reads, not fresh input. Since a cache read costs a tenth of a fresh read, a session that runs mostly on cache reads costs a fraction of the same session paying full input each turn. That gap is the whole point of confirming it.

The cache TTL trap that inflates the write cost

Caching has a write cost too, and the tier matters. A 5-minute cache write costs 1.25x base input. A 1-hour write costs 2x. The 5-minute tier pays off after a single read, so for most sessions it is the cheaper choice.

Around early March 2026 the default cache TTL regressed from 1 hour to 5 minutes, which raised cache-creation cost 20 to 32 percent for some users. Confirm the current behavior before you assume your TTL.

What this adds up to per month

Since 23 April 2026 the default model for Anthropic API and Enterprise pay-as-you-go users is Opus 4.7, so a climbing bill often traces to Opus running work Sonnet would handle. Check your model first. If you have been running Opus for everything, the routing switch alone moves the execution work to a model that costs about 40 percent less per token.

Put the routing rule where you will not forget it. A line in your CLAUDE.md documenting the model policy keeps the choice visible to anyone on the repo. Set the alias, run /cost after a working session, and read the cache-read share. Those two numbers tell you whether the lever is pulled.

FAQ

What is the opusplan alias in Claude Code?
opusplan runs Opus in plan mode for architecture and reasoning, then switches to Sonnet in execution mode for code generation. You set it with /model opusplan or the --model flag.
How much does prompt caching save in Claude Code?
A cache read costs 0.1x the base input price, a 90 percent discount on cached tokens. Claude Code caches your context automatically, so the saving applies without any setup.
Why did my Claude Code bill spike?
Usually because Opus is running work Sonnet would handle. Since 23 April 2026 the default model for API and Enterprise pay-as-you-go users is Opus 4.7, so check your model first and route execution to Sonnet.

Codex CLI review: what beats Claude Code after 2 months

Two months of daily Codex CLI use. The three things that stuck, where it beats Claude Code, and where Claude Code still wins.

Read

A short weekly email about AI tools and what's worth trying.

Free. No spam. Unsubscribe anytime.

More like this

All articles →

FixJun 4, 2026·3 min

Stop AI agents committing to main: a 2-line git hook

Branch protection blocks the push, not the local commit. A pre-commit hook stops any agent from landing work on main, with a one-line escape.

TutorialMay 24, 2026·6 min

Get Claude Code to write tests you'll keep

Claude Code writes tests that pass without checking anything. Write them before the code, lock the assertions, and keep the tests worth running.

BasicsMay 24, 2026·5 min

Claude Code sub-agents: when to spawn one (3 cases)

A sub-agent runs in its own context window and hands back a summary. Here is the one rule for when to spawn one, with three worked examples.

TutorialMay 24, 2026·4 min

Run two Claude Code sessions in one repo with worktrees

Claude Code has a native -w flag for git worktrees. Run a refactor and a feature in parallel, each in its own files, with no merge pain.

Was this helpful?

claude code coding cost prompt caching