The Expensive Middle

There’s a sentence in Anthropic’s Claude Sonnet 5 announcement that a competitor would pay to have written. It’s about their own cheaper model, and it quietly hands away the premium tier’s entire pitch.

Its performance is close to that of Opus 4.8, but at lower prices.
— Anthropic, introducing Claude Sonnet 5

When the vendor tells you the budget option is nearly as good as the premium one, believe them. They priced it, they benchmarked it, and they said it out loud anyway. That single line is the whole story of where Opus 4.8 now sits: not dethroned by a rival, but hollowed out from the inside by the tier below it.

The numbers, briefly

The sticker gap is stark. Opus 4.8 runs $5 per million input tokens and $25 output. Sonnet 5 launched at an introductory $2 and $10 (rising to $3 and $15 after August 31). At intro pricing that’s 2.5x cheaper.

One honest correction to that headline: Sonnet 5 ships with a newer tokenizer that produces roughly 30% more tokens for the same text. So the real per-task saving is closer to 1.9x than 2.5x. Cheaper, just not as cheap as the price card implies.

And even that discount is conditional. It holds while Sonnet runs at low effort. Turn the dial up to where it genuinely rivals Opus and it becomes startlingly token-hungry. One widely-circulated cost-per-task benchmark put Sonnet 5 at 1.2x the cost of Opus 4.8 at max effort, twice the cost of GPT-5.5, and anywhere from 5x to 57x the cost of the open-weight models it’s supposed to undercut. The benchmark author’s verdict was blunt: “Sonnet 5 goes straight into the garbage bin.” Cheaper per token, yes. Cheaper per finished task, often not.

On capability, the picture is a near-tie with one exception. These figures come from Anthropic’s own published charts (relayed second-hand, so treat them as directional):

Deep agentic coding: Opus leads. Around 69% vs 63% on SWE-bench Pro, a real six-point gap on the hardest multi-step work.
Tool-use reasoning: a dead heat. Roughly 57.9 vs 57.4 on Humanity’s Last Exam with tools.
Knowledge work: Sonnet actually edges ahead on GDPval.
Terminal and CLI tasks: Sonnet wins outright, around 80%.

So Opus keeps a genuine lead on exactly one thing: the deepest, longest coding chains. Everywhere else, Sonnet is inside the margin of noise, at a per-task cost that isn’t reliably lower.

The squeeze isn’t where you think

There’s a tempting story that Anthropic quietly strangled Opus with rate limits, and that’s what herded everyone onto Sonnet. It’s mostly wrong. Through 2026 the ropes got looser, not tighter: Claude Code’s five-hour limits doubled in early May, a peak-hour throttle was removed, and weekly limits rose 50%, all riding on a fresh SpaceX compute deal. For a single conversation, Opus stayed a comfortable daily driver.

The “plan with Opus, execute with Sonnet” split wasn’t forced by scarcity either. Anthropic shipped Opus plan mode back in August 2025, and the community’s own verdict was that it “automates what everyone was already doing manually” - weeks before the weekly limits had even taken effect. It’s a capability-and-cost call, not rationing: Opus for the reasoning-heavy planning where its edge is real, Sonnet for the fast, cheap execution.

So where does the squeeze people genuinely feel actually come from? Themselves. Nobody runs one conversation anymore. You run an orchestrator that fans out a dozen long-lived subagents across three or four workstreams at once, chewing through context in the background while you get on with something else. One developer reported a single five-hour session where Opus spawned 451 Sonnet subagents and burned through 14 million tokens. The unit of work stopped being a chat turn and became a fleet.

That is what eats the budget: not a dial Anthropic turned, but a working style scaling horizontally faster than any quota can follow. And it is exactly why the price gap between Opus and Sonnet stopped being abstract. When your default move is to spin up fifty agents at once, the question is no longer “which model is smartest,” it’s “which model can I afford fifty of.” Sonnet wins that for the cheap, low-effort grunt work a fleet runs on. Opus gets promoted to the single brain doing the thinking, because running the whole fleet on Opus is financial nonsense.

I’d say it sounds more like you tried to crush an ant with an excavator.
— r/ClaudeAI, on reaching for Opus by default

So Opus doesn’t get throttled. It gets sandwiched: Sonnet undercutting it from below on the sticker price, and its own users’ appetite for fan-out squeezing it into an ever-narrower role from the inside.

Match the effort to the task

The takeaway isn’t “never touch Sonnet,” it’s route by task rather than by habit. Sonnet 5 earns its keep at low and medium effort, doing cheap, fast execution across a fleet. What it can’t be is your budget model and your high-effort reasoner at the same time: crank the dial and you pay Opus-grade bills for sub-Opus results. As one developer put it on Hacker News, “Sonnet 5 on high costs more than Opus 4.8 at a lower pass rate.”

It cuts both ways

Be fair to Opus, because the case isn’t one-sided. It still owns the hardest deep-agentic coding by a real margin, it holds Anthropic’s own top scores on a handful of specialist benchmarks, and it has a few things Sonnet doesn’t: a fast mode, and the tightest instruction-following at the top of the range. If you’re doing genuinely hard, long-horizon engineering and you have the budget, it’s still the pick.

And nobody’s thrilled with either one. Sonnet 5’s own reception has been lukewarm, more consolation prize than event. Opus, for its part, gets knocked for being chatty: one popular line says it “bills by the paragraph.” The recurring gripe across Hacker News and Reddit is that the whole generation feels incremental. As one reply to a widely-read Opus 4.8 review put it, 4.7 and 4.8 are “both 4.6 that have each been RLd on top to vastly diminishing returns.” The big leaps are behind us; the fight has moved from raw capability to cost and routing.

What it means

Opus 4.8 is now the expensive middle, though “expensive” is doing slippery work: on real tasks at high effort, the “cheaper” Sonnet routinely bills more than Opus does. Opus’s problem was never really the price. It’s the position: not the default anyone reaches for, not the smartest on the shelf, its role narrowed to the brain you plan with rather than the fleet you run. A job, not a throne.

And here’s the part worth watching: the ladder just gained a rung back. Fable 5, the tier above Opus, returns this week. The US Commerce Department just lifted the export controls that had pulled it offline, and through July 7 it’s bundled into Pro and Max plans for up to half your weekly usage limit, before moving behind separate credits. So a smarter, hungrier model is about to eat into the very quota your subagents were already draining. The middle gets pressed from every side at once: undercut from below on price, out-classed from above on capability, and drained from within by your own fan-out.

The convenient framing is “which model wins.” That was never the right question. The right one is where each tier’s job actually lands once the dust settles, and for the premium tier the answer got a lot smaller than the launch-day benchmarks suggested.

Caveats worth stating plainly

The head-to-head benchmark figures here are Anthropic’s own, relayed second-hand rather than independently reproduced, so read them as directional not gospel. The cost-per-task comparison is a single third-party benchmark’s methodology and effort mix, not a universal result: your own numbers will move with how you actually run each model. Intro pricing on Sonnet 5 expires August 31, which narrows the gap. And the usage-limit numbers are subscription-plan policy that Anthropic has changed repeatedly in 2026, so anything specific may be stale by the time you read it. Verify the limits against your own plan before you architect a workflow around them.

The Expensive Middle

The numbers, briefly

The squeeze isn’t where you think

It cuts both ways

What it means

Share this article

Related Posts

One Went Dark, Two Went Open

The Copilot Meter Turned On, Right on Schedule

It Was Always an IPO