Every AI coding tool vendor has a productivity study. GitHub claims 55% faster task completion. Google reports 10% velocity gains. The numbers always go up.
But when independent researchers measure the same thing, a different picture emerges. One where AI makes experienced developers slower. Where code quality is declining by every metric. And where the real productivity drain isn’t the tools: it’s what the tools are doing to how we think.
The Studies Vendors Don’t Quote
METR (July 2025) ran the gold standard: a randomized controlled trial. 16 experienced open-source developers, 246 real tasks on their own repos (averaging 22k+ GitHub stars, 1M+ lines of code). Tasks randomly assigned to allow or disallow AI. The tools were best-in-class: Cursor Pro with Claude 3.5/3.7 Sonnet.
Result: AI made them 19% slower.
Before the study, developers predicted a 24% speedup. After the study, having seen their own timing data, they still believed AI had made them 20% faster. A 39-percentage-point perception gap between what they felt and what actually happened.
— METR study observation69% of developers continued using AI after the study, despite being measurably slower. AI appears to make work feel easier even when it takes longer.
Uplevel studied ~800 developers using actual engineering telemetry. No improvement in PR cycle time or throughput. But a 41% increase in bugs within pull requests for Copilot users.
Faros AI tracked 10,000+ developers across 1,255 teams. Individual output went up: 21% more tasks, 98% more PRs. But PR review times ballooned 91%. PR sizes inflated 154%. At the company level, any correlation between AI adoption and performance metrics evaporated.
GitClear analyzed 211 million changed lines across private and public repos from 2020-2024. AI-generated code had 41% higher churn (revised within two weeks). An eightfold increase in duplicated code blocks. 2024 was the first year copy-pasted lines exceeded moved lines: a historic shift in how code gets written.
The Docker blog nailed a meta-observation worth noting: the most robust studies finding large AI productivity gains come from companies that produce AI developer tools. Microsoft (OpenAI investor), GitHub (Copilot maker), Google (Gemini). Independent studies from METR, Uplevel, and GitClear consistently paint a more negative picture.
The Math Doesn’t Work
There’s a fundamental arithmetic problem with the 10x claim.
AWS data shows engineers write code roughly one hour per day: about 12.5% of their workday. The rest is design, review, meetings, debugging, documentation, context-switching. Multiple industry analyses converge on coding being 20-30% of total development work.
Even if AI made the coding portion 10x faster, Amdahl’s Law caps the overall improvement at roughly 1.22x. You can’t 10x the whole job by speeding up a fraction of it.
Worse, the Faros AI data shows the bottleneck just moves. More code gets written, so more code needs reviewing. PR review times nearly doubled. Context switching increased. Developers touched 9% more tasks and 47% more PRs per day. Speed in one phase created drag in every other phase.
— Kin Lane, API evangelistI don’t think I have ever seen so much technical debt being created in such a short period of time during my 35-year career in technology.
More Tasks, Less Focus
The Faros AI numbers hint at something the productivity framing misses entirely. Developers on AI-heavy teams touched 9% more tasks and 47% more PRs per day. That’s not focus. That’s fragmentation.
Harvard Business Review published (February 2026) what might be the most important study in this space. UC Berkeley researchers embedded with 40 workers at a tech company for eight months. Nobody was told to do more. No targets changed. But AI made more work feel doable, so people voluntarily took on more. They managed several active threads at once: writing code while AI generated alternatives, running multiple agents in parallel, reviving long-deferred tasks.
— Ranganathan & Ye, Harvard Business ReviewEmployees worked at a faster pace, took on a broader scope of tasks, and extended work into more hours of the day, often without being asked to do so.
The researchers describe a self-reinforcing cycle. AI accelerates a task. That raises the bar for what “good” looks like. More work gets created to meet the new bar. Which increases AI reliance. The result: “unsustainable intensity disguised as productivity.” Workers became quality-control inspectors for a prolific but unreliable junior colleague.
JetBrains’ telemetry from 151 million IDE window activations across 800 developers confirmed a related pattern: AI-assisted devs switch in and out of IDEs more frequently. 74% didn’t notice the increase. When context switching doesn’t feel like context switching, you can’t course-correct.
Bain & Company summed it up bluntly: “Software coding was one of the first areas to deploy generative AI, but the savings have been unremarkable.” Teams see 10-15% boosts. The time saved isn’t redirected toward higher-value work. It’s absorbed by more tasks.
Google’s 2025 DORA Report (~5,000 respondents) added the system-level cost: a 25% increase in AI usage quickened code reviews but resulted in a 7.2% decrease in delivery stability. More velocity. Less reliability. The dashboard looks better while the product gets worse.
The same UC Berkeley researchers found that work bled into lunch breaks and late evenings. To-do lists expanded to fill every hour AI freed up. TechCrunch reported the first signs of burnout are coming from the people who embrace AI the most. Not the resisters. The enthusiasts.
The Effort Problem
The fragmentation story is about doing more. This one is about thinking less.
Developers are offloading cognition. A Microsoft Research/Carnegie Mellon study (CHI 2025, 319 knowledge workers) found higher confidence in AI correlated directly with less critical thinking. Workers shifted from problem-solving to “AI response integration”: reviewing and approving rather than reasoning. The researchers coined “mechanized convergence”: similar prompts producing similar solutions, reducing creative diversity across teams.
Skills are measurably atrophying. Anthropic’s own research (January 2026) tested 52 junior engineers learning a new library. Those using AI assistance scored 17% lower on comprehension tests. Debugging ability was hit hardest: the exact skill you need to oversee AI-generated output. Developers who asked AI conceptual questions scored 65%+. Those who delegated code generation wholesale scored below 40%.
Brains are literally disengaging. MIT researchers using EEG found LLM users showed measurable cognitive under-engagement compared to search engine users and no-tool users. Reduced neural connectivity in networks associated with memory and creativity. Participants couldn’t recall what they’d written moments earlier.
Automation complacency is real. The aviation industry has decades of research on this: when automation “just works,” human vigilance drops. The New Stack drew the parallel directly to code review. Counterintuitively, experienced developers are more susceptible. Once early AI suggestions appear correct, they become significantly more likely to accept subsequent ones without scrutiny.
In METR’s 2026 follow-up, 30-50% of developers refused to submit tasks they didn’t want to do without AI, even at $50/hour. They weren’t saying the tasks were impossible. They were saying they didn’t want to do them manually anymore. That’s not a productivity tool. That’s a dependency.
The Junior-Senior Split
The one area where the data is more encouraging: junior developers. Multiple studies show juniors seeing 27-39% speed gains. AI acts as an always-available mentor. Seniors see only 8-16% gains and spend significantly more time reviewing AI suggestions (4.3 minutes per suggestion vs 1.2 minutes for human code).
But the Anthropic skill-formation study raises a hard question. If juniors using AI score 17% lower on comprehension tests, are those speed gains coming at the cost of learning? Speed now, skill gaps later.
Stanford’s employment data adds a darker dimension. Software developer employment for ages 22-25 declined roughly 20% from its late-2022 peak. Entry-level hiring fell 25% YoY in 2024. Meanwhile, employment for developers aged 35-49 grew 9%. The people best positioned to oversee AI are in demand. The people who were supposed to learn by doing the work AI now handles are finding fewer opportunities to learn.
Google’s internal study found a +10% engineering velocity gain from AI IDE features. Their own tools, fine-tuned on their own codebase, measured on their own developers. Not 10x. Ten percent. When the company that built Gemini measures single-digit percentage gains internally, the 10x narrative deserves serious scrutiny.
What Are We Losing?
The Stack Overflow 2025 survey (49,000+ respondents) captured the state of things: 84% of developers use AI tools. Trust in AI accuracy at an all-time low of 29%. 66% spend more time fixing “almost-right” code than they save. Usage up. Trust down. That’s habituation, not productivity.
The 10x narrative isn’t just inaccurate. It’s actively harmful. It sets expectations that lead to headcount reductions based on projected gains that don’t materialize. It normalizes declining code quality as a trade-off for speed. And it obscures the real question: not “how much faster are we?” but what we’re trading away in focus, skill, and quality to feel like we’re keeping up.
Further Reading
- METR: AI-Experienced Open-Source Developer Study (July 2025)
- METR: Uplift Study Update (February 2026)
- Faros AI: The AI Productivity Paradox (June 2025)
- GitClear: AI Copilot Code Quality 2025
- HBR: AI Doesn’t Reduce Work, It Intensifies It (February 2026)
- Anthropic: AI Assistance and Coding Skills (January 2026)
- Microsoft/CMU: AI and Critical Thinking (CHI 2025)
- Google: 2025 DORA Report (September 2025)
- Stack Overflow 2025 Developer Survey: AI Section
- Vibe Coding Kills Open Source (January 2026)


