Here’s the latest on Claude Opus 4.8 from reputable sources.
-
Summary: Claude Opus 4.8 just released, with notable improvements in honesty, reliability, and agentic capabilities (coding, reasoning, browser tasks, and long-form analysis). Anthropic markets it as a stronger collaborator with a new effort-control system and dynamic workflows, plus enhancements in code quality and guidance mechanisms. Sources indicate open testing reports show fewer confidently incorrect answers and more explicit uncertainty when appropriate.[1][2][5]
-
Key claimed improvements:
- Honesty and uncertainty handling: Opus 4.8 reportedly better at admitting uncertainty and avoiding ungrounded claims, and generally more cautious about code it generates.[2][1]
- Agentic and coding tasks: Improvements in agentic coding, multi-step workflows, and dynamic subagent orchestration; Claude Code in particular highlighted for large-scale code migrations and parallel tasking.[1][2]
- Planning and effort control: New user-controllable effort settings to trade off speed vs. depth, with a default bias toward higher effort for quality and user experience.[2]
- Benchmarks and comparisons: Claimed SWE-bench Pro and other benchmarks show strong performance relative to some frontier models; exact standings vary by task (e.g., coding benchmarks vs. terminal coding).[1][2]
-
Availability and pricing: Claimed to be available broadly with no price increase over Opus 4.7 in the initial rollout; Anthropic is also previewing Mythos-class models for potential broader safety and capabilities in the near term.[1]
-
Related context: A contemporary review video and articles discuss first-look impressions, including specifics on reliability improvements, dynamic workflows, and mid-task instruction injection capabilities that preserve prompt caches while updating system behavior.[4][2]
Would you like a concise side-by-side comparison table of Opus 4.7 vs Opus 4.8 across dimensions (honesty, coding, reasoning, effort control, dynamic workflows, pricing), or a link-by-link summary of each source with quotes? I can also pull specific benchmarks or use-case scenarios (coding, legal reasoning, web browsing) if you’re evaluating which version to adopt. Citations available on request.