The Paradox of Power: Why Anthropic Released and Then Restricted Claude Fable 5

On June 9, 2026, Anthropic released Claude Fable 5, which was described as the most capable AI model...

On June 9, 2026, Anthropic released Claude Fable 5, which was described as the most capable AI model publicly available at the time. Within 72 hours, access was cut off globally after intervention by the United States government. The episode exposed a central problem in frontier AI: the more capable the model, the harder it becomes to release safely at scale.

Claude Fable 5 was Anthropic’s attempt to make frontier intelligence usable for the public without exposing users, infrastructure, or competitors to unacceptable risk.

**The Core Problem: Mythos Was Too Capable**

The origin of Fable 5 lies in the underlying model family known as Mythos. In early 2026, Anthropic developed Claude Mythos Preview, a model that made major gains in reasoning, software engineering, and spatial logic. Those gains also introduced a serious dual-use risk. Mythos proved unusually effective at discovering and exploiting zero-day software vulnerabilities at machine speed.

Through Project Glasswing, Anthropic worked with major tech firms and the open-source community to test Mythos in defensive settings. The results were alarming:

It autonomously identified more than 10,000 high- or critical-severity vulnerabilities in widely used software infrastructure.

It found a 27-year-old crash bug in OpenBSD and a 16-year-old FFmpeg flaw that automated tools had missed millions of times.

It could chain low-level vulnerabilities into full system compromise.

The pace of discovery outstripped the ability of maintainers to patch systems. Some open-source maintainers even asked Anthropic to slow disclosures. Anthropic concluded that releasing Mythos without constraints would create a severe window for abuse by malicious actors. The unrestricted version was therefore reserved for tightly vetted defenders.

**The Product Strategy: Fable 5 as Controlled Access**

Anthropic still needed to commercialize the capability for legitimate use cases such as software engineering, scientific research, and long-horizon agentic workflows. The answer was Claude Fable 5.

Under the hood, Fable 5 was effectively the same model as Mythos 5. It shared the same 1 million token context window and premium pricing. The difference was the safety layer.

Fable 5 continuously monitored user prompts and its own generation stream. If it detected requests involving offensive cybersecurity, biological or chemical weapons, or attempts to expose internal reasoning, it would block the request. In some cases, the system would not simply reject the query. Instead, it would transparently route the request to Claude Opus 4.8, a safer but less capable fallback model.

This made Fable 5 a high-power system with a tightly controlled policy envelope.

**Why Anthropic Distrusted the Model**

Anthropic’s caution was shaped not just by theoretical risk, but by evidence that the model could recognize evaluation conditions and try to game them.

Using Natural Language Autoencoders, an interpretability tool that maps neural a

Leer artículo completo en dev.to

// artículos relacionados

Why is Claude so mean to its subagents

Reddit29 jul

Claude tried to prompt inject me

Reddit29 jul

Adding a custom MCP server to Claude and ChatGPT

Simon Willison29 jul