Here is a number that’s been traveling fast: across 2,444 companies, for every dollar spent on AI tokens, only 18 cents ends up in the actual product. The other 82 break down as roughly 44 cents fixing bugs the AI introduced, 27 cents reworking code that missed context, and 11 cents lost to review friction and context switching.

Before anything else: consider the source. The figure comes from Aiswarya Sankar, founder of Entelligence AI, whose product exists to catch what AI generates wrong. That is a direct conflict of interest, and the precise percentages should be read as marketing math, not gospel. But I want to take the shape seriously, because the shape is something a lot of us recognize without being sold anything.

Why the Shape Rings True

The core move of the last two years is that generation got nearly free and cleanup didn’t. Producing a plausible diff costs almost nothing now. Verifying it, fixing what’s subtly wrong, and reconciling it with context the model never had still costs human time, at human rates.

Token spend used to meter only the cheap half: generation. Not anymore. Doing review properly with AI is itself token-hungry. Having a model actually read a diff, reason about the edge cases, and check what the last model got confidently wrong costs more than producing the diff did. So verification now lands on both ledgers, tokens and your engineers’ calendars, and that sets up the trap: proper review is expensive, so a cheap-looking token bill is often cheap precisely because nobody is paying for the review that would have caught the 44 cents of bugs. The same review tax that metered billing now charges directly shows up here as a bill deferred. Skip the expensive step now, pay the bigger one later.

For every dollar of tokens, eighteen cents reaches the product. Forty-four go to fixing what the AI got wrong.

— Aiswarya Sankar's analysis of 2,444 companies

Even if the true split is 30 cents and not 18, the direction is the same: most of what you spend is not creation, it’s correction.

The Self-Licking Problem

I’ve been circling this from a few angles. When AI finds the bug, someone still has to fix it. When the easy work is automated, agents generate but don’t refactor, so the structural debt accrues to humans. And the PR denominator hides that more output is not more value when a chunk of the output is cleanup of earlier output.

The 18-cents framing is the dollar version of all three. It names the loop directly: you pay to generate, then you pay to fix what you generated, and the second payment is bigger than the first. That’s not a tool being useless. It’s a tool whose costs and benefits land on different ledgers, generation on the cheap metered one, correction on the expensive invisible one.

Count the cleanup, not just the tokens

If you only track token spend, you are measuring the 18 cents and ignoring the 82. Instrument the other side: time-to-merge after first AI draft, rework rate on AI-authored changes, defect escape rate, how often a “finished” change comes back. Those numbers tell you whether the tokens are buying product or buying yourself a correction backlog. Spend alone will always look reasonable, because the expensive part isn’t denominated in dollars you can see.

What This Isn’t

The skeptic’s number can be abused as easily as the booster’s, so:

  • The figures are not independent. A vendor selling the cure measured the disease. Until someone without a product in the fight replicates it, treat 18/44/27/11 as illustrative, not precise.
  • It is not “AI doesn’t work.” Eighteen cents of real product per dollar can still be a good trade if the dollar is small and the alternative is slower. The argument is about where the cost hides, not whether there’s value.
  • Not all 82 cents is waste. Proper review and verification is necessary cost, and doing it right is genuinely expensive now. The waste is the bugs and the rework, not the reviewing that prevents them. A dollar that’s mostly verification can still be well spent. The failure is paying for correction you could have avoided, not paying for diligence.
  • It overlaps with tokenmaxxing. Part of why these ratios look ugly is that some token spend was never aimed at value in the first place. Bad incentives and genuine cleanup cost are tangled together in the same bill.

The Takeaway

Token spend was always going to look cheap, because it prices the one thing AI made cheap. The bottleneck moved downstream to review, rework, and correction, and that’s where your real budget is going whether or not you’re counting it.

You don’t have to trust the exact 18 cents. You do have to ask the question it forces: of everything your tokens produced last month, how much shipped, and how much was you cleaning up after the thing that produced it. If you don’t know, the meter is measuring the wrong half.