Sundar Pichai didn’t open Google I/O with a frontier model. He opened with a cheap one. Gemini 3.5 Flash shipped first, Pro pushed to “next month.” The cheap tier beat last cycle’s flagship Pro on coding and agentic benchmarks, ran roughly 4x faster, and is already the default in Search and the Gemini app, in front of hundreds of millions of people.
The instant read was that Google is conceding the frontier: let Anthropic and OpenAI fight over the smartest model, Google takes the cheap middle. That read stops one layer too high. Cheap isn’t a model strategy here. It’s a silicon strategy.
Cheap Is Made of Silicon
You only ship the cheap model before the expensive one if your binding constraint is cost, not capability. Flash punching above its tier isn’t new; making it the headline is. The bottleneck Google is optimizing for is cost-per-deployment at planetary scale, not the top of a leaderboard. And it can win that fight because the reason isn’t in the model. It’s in the chip.
Google gives Flash away in Search because it isn’t paying Nvidia’s margin to run it. Gemini runs on TPUs Google designed, and in April Google changed what a TPU is: from renting capacity out of its own datacenters to selling the chips for customers to install in theirs. A clean shot at Nvidia, dressed as a product launch. The current generation, Ironwood, is pitched as “the first Google TPU for the age of inference.” Own the inference chip and “cheap” stops being a price you swallow and becomes a structural advantage you possess.
The Arms Dealer Runs an Army
The tell that it’s working is almost too on the nose. Anthropic, the frontier challenger, is paying Google a reported $40 billion for up to a million Ironwood TPUs over five years. Meta signed its own multibillion-dollar chip deal. And Bloomberg reported that inside Google DeepMind, researchers are queuing for the very TPUs Google is selling, in record volume, to Anthropic and Meta. Google’s sharpest competitors rent the compute they need to compete from Google, at a price that funds Google’s next chip, while Google’s own people wait in line behind them.
Google sells the picks to the people mining against it, and mines harder than any of them. The frontier labs rent the compute. Google owns it, and leases the surplus to its rivals at a profit.
Not a company conceding anything. A company that turned the most expensive input in the industry into a revenue line, paid by the competition.
Top to Bottom, One Owner
Step back and the whole column belongs to one company:
- Silicon: the TPU, now a product as well as a rental.
- Models: Gemini, sequenced cheap-first for ubiquity.
- Distribution: Search, Android, and a Gemini app past 900 million monthly users, framed as the action layer across an internet Google already owns.
- The edge: the Coralboard, a 1-TOPS transformer-native board running Gemma locally this summer, pushing the same stack onto cheap hardware in your hand.
Nobody else has all four. OpenAI rents its compute and owns no distribution. Anthropic rents Google’s chips and ships through other people’s clouds. Google is the only player where the same company makes the sand and serves the answer.
I argued yesterday that the future split two ways: the frontier model you rent and meter, versus cheap intelligence that runs everywhere for nearly nothing. Google’s answer is that it was never a choice. Own the stack and you play both. The metering that forces everyone else to charge by the token is downstream of who owns the compute: Google can give Search away because it isn’t paying rent on the thing doing the work.
What This Doesn’t Win
The honest other side, because owning the column isn’t the same as having won:
- A moat until it’s an indictment. “The action layer across the internet Google already controls” is the exact sentence that draws an antitrust complaint. The deeper the integration, the bigger the target.
- TPUs lock you in. The chip is cheaper; the software around it (JAX, XLA) is narrower than CUDA, which Nvidia spent a decade making sticky. Selling chips means selling adoption of your stack, not just your silicon.
- Cheap isn’t best. Leading with Flash cedes the top of the market for the fat middle. The hardest work, frontier agentic coding and the cyber-grade models, still goes to Opus and GPT.
- The edge is still a promise. Coralboard runs a 270M model and demos jellyfish-driven music. Ambient on-device intelligence is a direction, not yet a product.
- Renting to rivals is a hedge, not a checkmate. That $40 billion is also Google admitting it won’t consume its own capacity, and handing its fiercest competitor a guaranteed fuel supply.
If you’re building on models, the company that owns the silicon sets the floor, and the floor keeps dropping. Flash-class pricing isn’t a promotion, it’s a preview. Keep your prompts and evals provider-portable so you can chase the floor down as the hardware war pushes it there.
Closing
The headline was “Google goes cheap.” The actual move is that Google made cheap structurally impossible for anyone who has to rent the compute to do it. Flash is the visible piece; the TPU is the moat underneath it. The strategy was never the smartest model. It’s owning every layer between the sand and the search box, so intelligence can be the loss leader and the chips can be the business.
Everyone else is renting a floor in Google’s building. A few of them just signed forty-billion-dollar leases.


