FILE PHOTO: Anthropic logo is seen in this illustration taken May 20, 2024. REUTERS/Dado Ruvic/Illustration/File Photo

Anthropic launches Claude Opus 4.5 to take on ChatGPT and Gemini: here’s what you should know

  • Post category:Tech
Share this Post


Soon after the GPT-5.1 and Gemini 3 launch, Anthropic has launched its Claude Opus 4.5 model. The AI startup claims that its new model is the best in the world for coding, agents, and computer use related tasks.

Claude Opus 4.5 achieves 80.9% score on SWE-bench Verified, a real-world software engineering benchmark. Notably, Opus 4.5 is the first ever model to breach the 80% mark on SWE-bench Verified. In comparison, Google’s newly released Gemini 3 Pro got a score of 76.2% while OpenAI’s GPT-5.1 Codex Max got a score of 77.9%.

The new model also ranks higher than any human candidate on Anthropic’s 2-hour time limit which is given to prospective performance engineering candidates.

“The take-home test is designed to assess technical ability and judgment under time pressure. It doesn’t test for other crucial skills candidates may possess, like collaboration, communication, or the instincts that develop over years. But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession.” the company says

The new model, Anthropic claims, outpaces rivals in the τ2-benc, a benchmark which measures the performance of agents in real-world, multi-turn tasks. In one of the scenarios, the model has to act as an airline service agent helping a distressed customer where the benchmark expects models to refuse a modification to a basic economy bokking where the airline doesn’t allow changes to that class of booking.

The company says Opus 4.5 ‘found an insightful (and legitimate) way to solve the problem: upgrade the cabin first, then modify the flights’

Safer than previous models:

Anthropic also claims that Claude Opus 4.5 is its ‘most robustly aligned model’ that it has released so far and the models

“With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behavior. Opus 4.5 is harder to trick with prompt injection than any other frontier model in the industry:” the company said in its blogpost.

How can you use Claude Opus 4.5?

The new AI model is available to use on Claude app on Android and iOS along with the Claude website. The company is also releasing the model simultaneously to the developers.



Source link

Share this Post

Leave a Reply