IMO 2025 is crushed by DeepSeek's new reasoner

DeepSeek recently unveiled DeepSeek-Math-V2, an open-source MoE model that democratizes previously exclusive "research-level" mathematical reasoning by achieving gold-medal performance at IMO 2025.

The specifics:

The model achieved the gold standard by solving five out of six IMO 2025 tasks and scoring 118/120 in the 2024 Putnam competition, surpassing the highest human score.

It earned 61.9% on IMO ProofBench, shattering GPT-5, which only scored 20%, and almost matching Google's customized Gemini Deep Think, which got IMO gold.

Rather of rewarding final answers alone, Math-V2 employs a generator-verifier system in which one model suggests a proof and another evaluates it.

By giving stages confidence scores, the verifier ensures that reasoning is self-debugged step-by-step and forces the generator to improve weak logic.

DeepSeek has cracked the monopoly on frontier mathematical reasoning by open-sourcing a model that competes with Google's internal heavyweight. This has given the community a template for creating agents that can debug their own mental processes. In fields like engineering, where errors are expensive, this may be revolutionary.