Twitter/X: On AIME 2025 math reasoning: 97%. On SWE-Bench Pro: matches Claude Opus 4.6. In blind evals: pref…

On AIME 2025 math reasoning: 97%. On SWE-Bench Pro: matches Claude Opus 4.6. In blind evals: preferred over Claude Sonnet 4.6. If you're building on Claude or GPT, you now have a serious third option on the cloud you probably already use.

Full content is available at the original source.

x.com

Read full article on x.com

// related articles

Why is Claude so mean to its subagents

RedditJul 29

Claude tried to prompt inject me

RedditJul 29

Adding a custom MCP server to Claude and ChatGPT

Simon WillisonJul 29