What Anthropic Actually Said About AI Building Itself

In June 2026, Anthropic released a report called "When AI builds itself." The headlines made it sound...

In June 2026, Anthropic released a report called "When AI builds itself." The headlines made it sound like AI was on the verge of superintelligence in which machines were building better versions of themselves in a feedback loop.

The actual report asks something more specific. Can AI agents pick their own research problems, or just execute the ones humans hand them? And if agents do start picking their own directions, what happens next?

Anthropic is being honest about what they've measured: agents are getting much better at executing research. They're uncertain whether these agents will ever figure out problem selection. And they're saying that if the current trend holds, institutions aren't prepared for what comes after.

This isn't hype. This is a company admitting it built something powerful and isn't sure what to do about it.

What They Actually Tested

Anthropic posed a straightforward question to Claude agents: can a weaker model reliably supervise a stronger one?

The logic is counterintuitive but sensible. A peer reviewer doesn't need to be smarter than the author overall. They just need to catch logical gaps and unsupported claims.

Here's how they measured it:

Baseline: weak supervision catches some errors but not many.

Perfect supervision: the strong model gets flawless feedback.

The test: can agents use weak supervision to close that gap?

Two human researchers spent about a week on this. They closed 23 percent of the gap.

Claude agents ran for 800 cumulative hours (roughly 33 days of nonstop compute). They spent $18,000. They closed 97 percent of the gap.

One of the researchers who ran the experiment:

"Claude did all of this with minimal help from me over 1 to 2 days. If a junior colleague came back with results like this in that timeframe, I'd be mildly impressed. The future is now."

But there's a catch. The approach didn't transfer to production-scale models. It worked in a controlled lab setting. When they tried the same technique on real Claude training, the results fell apart.

And crucially: humans still picked the problem. Humans defined what good supervision means. The agents just optimized for that metric. They didn't autonomously decide the research was worth pursuing.

This is the gap between execution and direction-setting.

What Agents Are Actually Good At

Execution

Over 80 percent of the code merged into Anthropic's production codebase was written by Claude as of May 2026. Before February 2025, it was in the low single digits.

Engineers are writing eight times more code per day than they were two years ago. They're not writing the code. They're directing Claude to write it, then reviewing.

One engineer: "I stopped writing code myself about 5 months ago. Everything goes through Claude now."

Claude Code success on open-ended problems rose from 26 percent six months ago to 76 percent in May 2026. Open-ended means the engineer doesn't know what the answer should look like.

Optimization

Every time Anthropic r

Leer artículo completo en dev.to

// artículos relacionados

Cognizant Anthropic

Anthropic28 jul

Import AI 466: The bitter lesson for robotics, AIs complete week-long programming tasks; and OpenAI's accidental AI hacker

Import AI27 jul

Karparthy removed Anthropic from his bio

Reddit26 jul