Cut your Claude Code cost in half with one routing rule
Most of a Claude Code bill is Opus doing work Sonnet handles fine. Here is the routing rule that fixes it, plus how to confirm caching works.

By the end of this you'll have Claude Code running Opus for planning and Sonnet for everything else, which cuts your input and output cost by roughly 40 percent per token on the work that runs in execution mode. The setting is one alias. Claude Code already caches your context, so a cache read costs 10 percent of a fresh read. Routing is the lever you set yourself, and it is the biggest one.
You need two things first. Claude Code installed and billed per-token through the Anthropic API, not a flat Max plan. About 10 minutes to switch the model and read your usage.
Where the money goes
Most of a Claude Code bill is Opus running tasks Sonnet handles fine. Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. Sonnet 4.6 input is $3 per million, and output is cheaper still, so the same task on Sonnet costs you a fraction of the Opus rate.
The cost is not caching. Claude Code caches your context automatically, so the file tree, the system prompt, and the conversation so far get cached without you touching a setting. The lever you control is which model burns through those tokens. Model choice moves more money than anything else in the tool.
Set the routing rule with the opusplan alias
The fix is the opusplan alias. It uses Opus in plan mode for architecture and reasoning, then switches to Sonnet in execution mode for code generation. You plan with the smart model and execute with the cheap one.
Set it inside a session with the /model command.
claude
> /model opusplanTo make it the default for a project, pass the --model flag when you launch, or set the model in your config so every session starts there.
claude --model opusplanThe reason this works is that planning is where reasoning pays off and execution is where token volume piles up. Architecture decisions are a few hundred tokens of careful thought. Code generation is thousands of tokens of output, and that is the work you want on Sonnet at $3 per million input.
When to drop to Haiku instead
Some work does not need Sonnet either. Test generation, code review, and renaming passes are routine, and Haiku handles them for less. Haiku input is $1 per million against Opus at $5, so routing routine tasks to Haiku while reserving Opus for architecture cuts the per-token cost on that work by most of the gap.
Use Haiku for the mechanical jobs. Reach for Sonnet when the task involves real code logic, and keep Opus for the moment you are designing a system or debugging something tangled. The split is by difficulty, not by habit.
Check that prompt caching is already working
Caching is on by default, but it is worth confirming it lands. A cache read costs 0.1x the base input price, a 90 percent discount on every cached token. The difference is large enough to see in the bill.
Run the /cost command inside a session to see your usage breakdown.
> /costYou want cache reads to be a high share of your input tokens. A long session that keeps re-reading the same files should show mostly cache reads, not fresh input. Since a cache read costs a tenth of a fresh read, a session that runs mostly on cache reads costs a fraction of the same session paying full input each turn. That gap is the whole point of confirming it.
The cache TTL trap that inflates the write cost
Caching has a write cost too, and the tier matters. A 5-minute cache write costs 1.25x base input. A 1-hour write costs 2x. The 5-minute tier pays off after a single read, so for most sessions it is the cheaper choice.
Around early March 2026 the default cache TTL regressed from 1 hour to 5 minutes, which raised cache-creation cost 20 to 32 percent for some users. Confirm the current behavior before you assume your TTL.
What this adds up to per month
Since 23 April 2026 the default model for Anthropic API and Enterprise pay-as-you-go users is Opus 4.7, so a climbing bill often traces to Opus running work Sonnet would handle. Check your model first. If you have been running Opus for everything, the routing switch alone moves the execution work to a model that costs about 40 percent less per token.
Put the routing rule where you will not forget it. A line in your CLAUDE.md documenting the model policy keeps the choice visible to anyone on the repo. Set the alias, run /cost after a working session, and read the cache-read share. Those two numbers tell you whether the lever is pulled.
FAQ
What is the opusplan alias in Claude Code?
opusplan runs Opus in plan mode for architecture and reasoning, then switches to Sonnet in execution mode for code generation. You set it with /model opusplan or the --model flag.How much does prompt caching save in Claude Code?
A cache read costs 0.1x the base input price, a 90 percent discount on cached tokens. Claude Code caches your context automatically, so the saving applies without any setup.Why did my Claude Code bill spike?
Usually because Opus is running work Sonnet would handle. Since 23 April 2026 the default model for API and Enterprise pay-as-you-go users is Opus 4.7, so check your model first and route execution to Sonnet.
Newsletter
A short weekly email about AI tools and what's worth trying.
Free. No spam. Unsubscribe anytime.
More like this
All articles →
Claude Code sub-agents: when to spawn one (3 cases)
A sub-agent runs in its own context window and hands back a summary. Here is the one rule for when to spawn one, with three worked examples.

Run two Claude Code sessions in one repo with worktrees
Claude Code has a native -w flag for git worktrees. Run a refactor and a feature in parallel, each in its own files, with no merge pain.

How to write a CLAUDE.md that changes Claude Code's behavior
The five sections that shift how Claude Code works in your repo, plus what to leave out so the file stays under 200 lines.

Claude Code agent loop: 3 causes and the fix
Claude Code looping on the same edit means one of three things: lost context, a failing command, or a vague task. Here is the fix for each.
Was this helpful?
