On AIME 2025 math reasoning: 97%. On SWE-Bench Pro: matches Claude Opus 4.6. In blind evals: preferred over Claude Sonnet 4.6. If you're building on Claude or GPT, you now have a serious third option on the cloud you probably already use.
El contenido completo está disponible en la fuente original.
x.com
// artículos relacionados
Twitter/X: 🚀 While you’re sleeping, the market is printing. I’m a futures trader, sharing my personal high…
Twitter/X: anthropic is literally crushing the new model benchmark https://t.co/Mhyy2SZtUQ
Twitter/X: EU regulator evaluating implications of Anthropic Mythos curbs after US directive https://t.co/FE…