IMO 2025 is crushed by DeepSeek's new reasoner

DeepSeek recently unveiled DeepSeek-Math-V2, an open-source MoE model that democratizes previously exclusive "research-level" mathematical reasoning by achieving gold-medal performance at IMO 2025.

The specifics:

The model achieved the gold standard by solving five out of six IMO 2025 tasks and scoring 118/120 in the 2024 Putnam competition, surpassing the highest human score.

It earned 61.9% on IMO ProofBench, shattering GPT-5, which only scored 20%, and almost matching Google's customized Gemini Deep Think, which got IMO gold.

Rather of rewarding final answers alone, Math-V2 employs a generator-verifier system in which one model suggests a proof and another evaluates it.

By giving stages confidence scores, the verifier ensures that reasoning is self-debugged step-by-step and forces the generator to improve weak logic.

DeepSeek has cracked the monopoly on frontier mathematical reasoning by open-sourcing a model that competes with Google's internal heavyweight. This has given the community a template for creating agents that can debug their own mental processes. In fields like engineering, where errors are expensive, this may be revolutionary.

Your one-stop shop for automation insights and news on artificial intelligence is EngineAi.
Did you like this article? Check out more of our knowledgeable resources:
📰 In-depth analysis and up-to-date AI news .
🤝 Visit to learn about our goal and knowledgeable staff.
📬 Use this link to share your project or schedule a free consultation.
Watch this space for weekly updates on digital transformation, process automation, and machine learning. Let us assist you in bringing the future into your company right now.

IMO 2025 is crushed by DeepSeek's new reasoner

📚 You might also like

Supreme Court Sidesteps AI Copyright Question: What Thaler v. Copyright Office Means for the Future of Creative IP

OAI gets a Pentagon contract while Trump fires Anthropic

The frontier of OpenAI for handling AI coworkers