Distillation Is Not Scraping: Why the Internet's Favourite Take Is Wrong

On February 23, Anthropic published a report accusing three Chinese AI labs of running industrial-scale distillation campaigns against Claude. DeepSeek, Moonshot AI, and MiniMax collectively created over 24,000 fake accounts and generated more than 16 million exchanges, systematically extracting Claude’s reasoning capabilities to train their own models.

The internet’s response was almost unanimous: hypocrisy.

The Hypocrisy Argument

Anthropic scraped the open internet to train Claude. They’re facing lawsuits from Reddit, music publishers, and authors. OpenAI did the same. Google did the same. Now they’re upset that someone scraped them?

Anthropic “distilled” Reddit posts en masse too, even after Reddit changed its ToS to prevent this.
— r/ClaudeAI top comment

It’s a satisfying narrative. Thieves complaining about thievery. The whole thread reads like a victory lap for anyone who’s argued that knowledge wants to be free. And honestly? The critics landed some real punches:

Anthropic bans users for VPN usage while 24,000 fake accounts went undetected for months. One user: “Claude randomly canceled my account because I was using a VPN yet somehow let 24k fake accounts rob it blind.”
If they paid for API access, it’s a ToS violation, not theft. Calling it an “attack” is generous framing for what amounts to a contractual breach.
The timing is nakedly political. This report dropped as Washington debates chip export controls. Every paragraph reinforces Anthropic’s policy position. This month alone, Anthropic, OpenAI, and Google all published similar accusations in what looks like a coordinated lobbying effort.
The “safety” framing is convenient. Anthropic’s business model depends on being the premium reasoning provider. “Distillation threatens national security” just happens to also mean “distillation threatens our revenue.”

These aren’t fringe takes. They’re the dominant sentiment across Reddit and X. And they’re not wrong.

Where the Take Falls Apart

But they’re not complete either. The crowd is conflating two genuinely different things.

Web scraping for pretraining collects raw human knowledge. Books, articles, code, forum posts. The model learns from the sum of what humanity has written. This is ethically fraught - real people created that content and weren’t compensated - but it’s reading the library.

Distillation extracts the refined capabilities of another model. Not raw knowledge, but the product of research, RLHF, safety training, and architectural decisions. DeepSeek wasn’t reading the library. They were copying the librarian: the reasoning patterns, the judgment, the chain-of-thought that took years to develop.

What DeepSeek actually did

DeepSeek specifically prompted Claude to reconstruct its own reasoning step by step, generating chain-of-thought training data at scale. They also had Claude produce censorship-safe rewrites of politically sensitive queries - building a dataset to circumvent their own government’s content restrictions using another company’s safety work.

The economic asymmetry is stark. Billions in research distilled for maybe $5M in API fees. You can argue whether that’s theft or clever arbitrage, but it’s not the same as crawling the web. One scales linearly with the cost of bandwidth. The other scales linearly with the cost of someone else’s R&D.

The Part Nobody Wants to Engage With

Here’s what the “turnaround is fair game” crowd skips: distillation copies capabilities but not behaviour.

Look at what DeepSeek actually extracted. They had Claude produce censorship-safe rewrites of politically sensitive queries about Chinese dissidents. That’s not abstract “safety” - that’s a specific capability being repurposed. Claude was trained to handle sensitive political topics carefully. DeepSeek extracted that capability and redirected it: instead of “discuss this thoughtfully,” it becomes “help us build training data to suppress this topic entirely.”

That’s the pattern. Distillation transfers what a model can do but not why it does or doesn’t do it. The refusal behaviours, the careful calibration, the “I can help you understand chemistry but I won’t help you synthesize nerve agents” distinction - that’s encoded in layers of RLHF and red-teaming that don’t survive extraction. You get the knowledge of a chemistry PhD minus the professional ethics.

One Reddit commenter tried to raise this and got buried under 200 downvotes:

How don’t you guys see that the problem is that it’s not keeping the safety training after distillation.
— r/ClaudeAI, buried in the thread

They got buried because the thread was too busy dunking on Anthropic to engage. Fair enough. But we already know what happens when capable models ship without safety work. People have been jailbreaking open-weight models for years. The difference is those models were trained from scratch with known limitations. A distilled model has frontier-level capabilities in a package nobody audited.

I wrote two weeks ago about safety teams leaving from the inside - researchers quitting because they couldn’t do the work that mattered. Distillation is the same erosion from the outside: even when safety work does get done, it doesn’t survive the copy.

One of the sharper Reddit takes pointed out the real business threat: distillation proves you don’t need a massive cloud model to get results. That’s true, and it’s why Anthropic is scared. But “Anthropic is scared for commercial reasons” and “distilled models without safety work are a problem” can both be true simultaneously.

What This Means for Developers

If you’re building on top of AI models, there’s a practical takeaway buried in the drama.

Model agreement means less now. If Chinese models were distilled from Claude and GPT-4, getting the “same answer” from multiple models doesn’t mean independent verification. It might mean they share the same teacher.
Open-weight doesn’t mean independently developed. A model distilled from Claude without safety training is a different risk profile than one trained from scratch. Provenance matters.
The “DeepSeek trained for $5M” narrative just collapsed. If significant capabilities came from distilling US models, the headline cost was always misleading. The cheap training story that tanked the stock market was, at least partially, built on someone else’s R&D bill.

The geopolitical angle

A senior US government official alleges DeepSeek also illegally obtained banned Blackwell chips. Combined with industrial distillation and state backing, this isn’t a scrappy open-source project democratising AI. It’s state-level capability acquisition wearing an open-source hat.

No Clean Hands

The AI industry is built on taking other people’s work. Authors, artists, Reddit users, Stack Overflow contributors - they all got extracted from without compensation. Anthropic is a hypocrite. So is OpenAI. So is Google. The entire foundation of this industry is ethically compromised.

But “everyone’s a hypocrite” isn’t analysis. It’s a thought-terminating cliché that lets you stop thinking about a problem that’s actually getting worse.

The distinction between scraping and distillation matters - not because Anthropic deserves sympathy, but because stripped-down models with frontier capabilities and no safety training are a different category of problem than web scraping ever was. That problem exists regardless of who’s complaining about it.

Dismissing it as “turnaround is fair game” might feel satisfying. It’s the kind of take that ages poorly.

Distillation Is Not Scraping: Why the Internet's Favourite Take Is Wrong

The Hypocrisy Argument

Where the Take Falls Apart

The Part Nobody Wants to Engage With

What This Means for Developers

No Clean Hands

Share this article

Related Posts

The Safety Team Left. We're Still Shipping.

The Polyglot Stack: Why Developers Stopped Picking One AI

The Week AI Went Full Throttle