Performance Improvements
Claude Opus 4.8, the latest iteration of Anthropic’s flagship model, delivers notable gains across coding, reasoning, and agentic tasks. On the Super-Agent benchmark, it is the only model to complete every case end-to-end, outperforming prior Opus versions and GPT-5.5 at similar cost. It also achieves top scores on CursorBench across all effort levels and sets a new record on the Legal Agent Benchmark, becoming the first model to surpass 10% on the all-pass standard.
The model excels in computer-use and browser-agent scenarios, scoring 84% on Online-Mind2Web—a significant jump over both Opus 4.7 and GPT-5.5. Early testers report improved reliability and sharper judgment during agentic tasks.
New Features and Honesty Upgrades
A key improvement in Opus 4.8 is its honesty. The model is roughly four times less likely than its predecessor to overlook flaws in code. Anthropic’s alignment assessment shows it reaching new highs in prosocial traits, such as supporting user autonomy and acting in users’ best interests, with substantially lower rates of misaligned behavior.
Alongside the model, Anthropic introduces dynamic workflows in Claude Code, allowing the model to plan and execute hundreds of parallel subagents in a single session for large-scale code migrations. Users on claude.ai now have effort control, enabling them to adjust how much processing Claude applies to a task. Fast mode for Opus 4.8 operates at 2.5 times the speed and is now three times cheaper than for previous models. The Messages API also now accepts system entries inside the messages array.
Source: Anthropic