Twitter/X: On AIME 2025 math reasoning: 97%. On SWE-Bench Pro: matches Claude Opus 4.6. In blind evals: pref…

On AIME 2025 math reasoning: 97%. On SWE-Bench Pro: matches Claude Opus 4.6. In blind evals: preferred over Claude Sonnet 4.6. If you're building on Claude or GPT, you now have a serious third option on the cloud you probably already use.

El contenido completo está disponible en la fuente original.

x.com

Leer artículo completo en x.com

// artículos relacionados

Why is Claude so mean to its subagents

Reddit29 jul

Claude tried to prompt inject me

Reddit29 jul

Adding a custom MCP server to Claude and ChatGPT

Simon Willison29 jul