The Proof Machines: How Google DeepMind Quietly Solved Nine Erdős Problems While OpenAI Was Still Basking in the Glory
There’s an old joke in mathematics departments: an Erdős problem walks into a bar, and fifty years later, someone finally figures out the tab.
For decades, the unsolved questions posed by the legendary, nomadic Hungarian mathematician Paul Erdős have served as a kind of intellectual Everest for researchers. You don’t solve an Erdős problem to get famous. You solve it to prove you belong at the table. Some of them have sat there since the 1960s, gathering dust, occasionally teasing a postdoc who thinks they’ve found a clever shortcut, only to realize they’ve stumbled into a dead end.
Last week, two different AI systems—built by two bitter rivals—claimed they’d cracked open pieces of that legacy within 24 hours of each other. And the way they did it tells you more about where artificial intelligence is actually heading than any GPT-5 demo ever could.
Let’s start with the quieter, weirder, and arguably more significant event.
The Google DeepMind Double Punch
Late Tuesday night (or early Wednesday, depending on your time zone and your tolerance for arXiv notifications), a team at Google DeepMind quietly published results from a system called AlphaProof Nexus. The name is a mouthful, but the achievement is simple to state: the AI solved nine open Erdős problems, two of which had gone unsolved for 56 years.
Not “found evidence for.” Not “suggested a plausible approach.” Solved. With machine-verified proofs.
The problems spanned combinatorics and graph theory—fields Erdős essentially helped invent. They weren’t trivial puzzles. One of the 56-year holdouts involved a specific bound on Ramsey numbers that had frustrated three generations of combinatorialists. Another dealt with a conjecture on intersecting set families that Erdős himself once called “annoyingly slippery.”
AlphaProof Nexus chewed through them in a matter of hours. Total computing cost per problem? A few hundred dollars in cloud compute.
To put that in perspective: a human mathematician might spend five years on one of these, often with no guarantee of success. A well-funded research group might burn through a million dollars in salary and conference travel just to publish a partial result. DeepMind spent less than a round-trip plane ticket to Tokyo and got nine complete, verified solutions.
How It Actually Works (No Hype, Just Lean)
Here’s where it gets interesting, and where the usual “AI will replace mathematicians” panic starts to look a little more nuanced.
AlphaProof Nexus is not a single model. It’s a two-part system. The first part is a large language model—not wildly different from the kind that writes your emails or summarizes documents. But instead of generating marketing copy or bad poetry, this LLM is fine-tuned to produce mathematical statements in a language called Lean.
Lean is what’s known as a proof assistant. Think of it as a compiler for math. If you write a proof in Lean, the software checks every single logical step, down to the axioms. No hand-waving. No “clearly, it follows that.” No “the details are left as an exercise for the reader.” Lean demands rigor or it rejects your proof outright.
The second part of AlphaProof Nexus is the verifier. And this is the secret sauce: the system doesn’t just generate a proof and submit it for review. It generates a candidate, feeds it to Lean, gets a pass/fail signal, and then loops back. Generate, verify, fail, adjust, generate again. It’s a closed loop of machine-checked reasoning.
In practice, this means AlphaProof Nexus can try thousands of proof strategies in the time it takes a human to make coffee. Most of them fail—spectacularly, nonsensically. But when one finally passes the Lean verifier, you have something unprecedented: a proof that is both novel and mathematically bulletproof.
No human referee. No peer review theater. Just a machine saying, “I’ve checked every inference, and it holds.”
The Price Tag Problem
DeepMind also released a fascinating footnote. They built a simpler version of the same agent—call it AlphaProof Lite—that used fewer computational tricks and a dumber search strategy. That system matched the main Nexus results on all nine Erdős problems. But it cost significantly more to run per problem. Not thousands more, but enough that the team noted the difference explicitly in their paper.
Why does that matter? Because it tells you something about the nature of these proof-finding problems. The harder mathematical problems don’t just require more compute. They require smarter search. The Nexus system’s advantage wasn’t raw power; it was a more efficient way to navigate the infinite tree of possible proof steps. That’s genuinely new. That’s not just scaling up a language model. That’s structural innovation.
And here’s the kicker: the same system also proved 44 open conjectures from the Online Encyclopedia of Integer Sequences (OEIS). If you’re not a mathematician, OEIS is this beautiful, obsessive crowdsourced database of number sequences—everything from the Fibonacci numbers to that one weird sequence that appears in quantum physics models and nowhere else. Open conjectures in OEIS are usually small, isolated claims: “Every term in sequence A12345 is divisible by 3 after the 12th term” kind of stuff. But forty-four of them? In one run? That’s not a breakthrough. That’s a broom sweeping through the attic.
Where It Still Falls Apart
Now, before anyone declares the field of mathematics obsolete, let’s talk about what AlphaProof Nexus cannot do.
The paper explicitly notes that problems requiring new mathematical constructions remained out of reach. That’s not a small caveat. That’s the heart of creative mathematics.
Here’s the difference. An Erdős problem about bounding a Ramsey number? That’s a proof within an existing framework. The concepts already exist. The language is already written. The AI is finding a path through a known landscape. But a problem that demands a brand-new definition—a new kind of object, a new invariant, a new way of measuring something—that still stumps these systems.
Example: when mathematicians proved Fermat’s Last Theorem, they didn’t just find a clever derivation. They built the modularity theorem along the way. They invented new mathematics. Current AI, including AlphaProof Nexus, can’t do that. It can traverse the map. It cannot draw a new map.
The DeepMind team is honest about this. In their conclusion, they write: “Problems requiring the invention of novel definitions or the extension of existing theories remain beyond the reach of current proof-generation methods.” That’s academic code for: we solved nine cool puzzles, but we didn’t create a new field.
Meanwhile, Across Town: OpenAI’s Rollercoaster Week
You can’t understand the DeepMind news without understanding what happened at OpenAI just one day earlier.
Last Monday, OpenAI announced what looked like a stunning victory of its own. Their system—details were sparse, but it involved a different reasoning architecture—had disproven an 80-year-old Erdős conjecture. Not solved. Disproven. That’s even rarer. Finding a counterexample to a conjecture that Erdős himself believed true is like finding a crack in an old cathedral’s foundation. Nobody expects it.
The mathematical community, already jittery from a year of AI hype, started paying attention. An 80-year-old conjecture? That’s not a homework problem. That’s a legacy.
Except.
This is the same OpenAI, you may recall, that a few months ago walked back a much bolder claim. Back in the spring, they said their system had solved ten novel mathematical problems—ten!—only to quietly revise the statement after other researchers pointed out that several of the “solutions” were either incomplete or had been previously published. The retraction was polite, understated, and absolutely brutal if you read between the lines. “We have updated our claims to reflect the current understanding of the results” is PR-speak for “we got ahead of our skis.”
So when the Erdős disproval announcement dropped, the reaction was split. Half the internet cheered. The other half said, “Let’s wait a month and see if it holds up.”
At time of writing, no independent verification has been published. That’s normal—math moves slowly. But it’s also a reminder that OpenAI and DeepMind are playing two different games. OpenAI is chasing headlines. DeepMind, at least in this case, released a system whose proofs are verifiable in Lean right now, today, by anyone with a laptop and a copy of the software.
Why the Lean Framework Changes Everything
The real story here isn’t who solved more problems. It’s that formal verification has grown up.
For years, proof assistants like Lean, Coq, and Isabelle were niche tools used by a small cult of computer scientists and logicians. You had to be a little bit masochistic to use them. Writing a proof in Lean felt like programming in assembly language while also doing calculus in your head.
But the DeepMind team showed that an LLM—trained on a mix of natural language math papers and Lean code—can bridge that gap. The model doesn’t have to be perfect. It just has to be good enough to generate candidates that pass the verifier. And once you have that loop, you’re no longer praying that the AI is correct. You’re checking its work, mechanically, line by line.
This is what people mean when they say “formal verification changes the game.” In traditional mathematics, a proof is accepted if enough experts agree it’s correct. That’s a social process. It works remarkably well, but it has failed—famously, embarrassingly—more than a few times. The four-color theorem’s original proof was contested for a decade. More recently, a claimed proof of the abc conjecture has been in limbo for years because nobody can agree if it’s coherent.
Machine-verified proofs don’t have that problem. They’re either correct or they’re not. And the Lean kernel is small enough—a few thousand lines of code—that it can be manually audited. Trust the kernel, trust the proof.
The Quiet Revolution: Speed Discovery
Here’s the part that keeps me up at night (in a good way).
AlphaProof Nexus solved two 56-year-old problems in a few hours. Forty-four open OEIS conjectures in what, an afternoon? The cost per problem: a few hundred dollars.
Now imagine scaling this. Not ten times larger. A thousand times. A million.
We are about to enter an era where routine mathematical problems—the kind that fill PhD theses and junior faculty research agendas—can be solved by machines at near-zero marginal cost. That doesn’t eliminate mathematicians. It eliminates the drudgery that mathematicians have to do before they get to the interesting part.
The real breakthroughs will still come from humans (for now). But humans will use tools like AlphaProof Nexus the way we use calculators or search engines: as cognitive extensions. You want to know if a lemma holds? You don’t spend three weeks trying to prove it. You feed it to the machine, wait ten minutes, and get a yes/no with a verified proof attached.
That changes the pace of research. Dramatically.
What DeepMind Isn’t Saying
Let me put on my skeptical hat for a moment.
DeepMind’s paper is impressive, but it’s also a marketing document. The choice to release it the day after OpenAI’s announcement was not a coincidence. The name “AlphaProof Nexus” is pure corporate branding. The emphasis on “nine Erdős problems” is designed to echo through university PR departments.
And there’s a deeper question the paper doesn’t address: how many of these solved problems are actually interesting? Erdős offered cash prizes for his problems—typically between $25 and $1000, famously. But not all Erdős problems are created equal. Some are deep structural questions. Others are clever but isolated. The paper lists all nine solved problems, but it doesn’t rank them by significance. That’s because significance is a human judgment, not a machine output.
A cynical reading: DeepMind picked low-hanging Erdős fruit. Problems that were unsolved but not deeply unsolvable. The kind of problems a very determined postdoc with a good idea could crack in a year. The AI just did it faster.
A less cynical reading: that’s exactly the point. If AI can handle the “year of work” problems, that frees humans for the “decade of work” problems. That’s not a flaw. That’s the whole value proposition.
The One-Line Summary That Actually Matters
Here’s what you should take away from all of this.
Google DeepMind built a system that generates machine-verified proofs, loops through a verifier until one passes, and used it to solve nine open Erdős problems at a few hundred dollars each. OpenAI made a splashy claim about disproving an 80-year-old conjecture, but their track record on walking back bold statements means the math community is waiting for receipts.
The real breakthrough isn’t the number of problems solved. It’s that formal verification has become practical enough that an LLM—flawed, hallucinating, glorified autocomplete that it is—can be turned into a reliable proof-finding machine by wrapping it in a verifier.
That pattern—generator plus verifier—is going to spread far beyond mathematics. Code generation? Same thing. Scientific hypothesis testing? Same thing. Medical diagnosis? You see where this is going.
The AlphaProof Nexus isn’t a mathematician. It doesn’t understand what it proved. But it doesn’t need to. It just needs to generate candidates that Lean approves. And that, quietly, is the scariest and most hopeful thing about the whole story.
For the first time in history, we have machines that can discover new truths—narrow, brittle, definitional truths—without understanding a single thing about them. Whether that’s a revolution or just a very fast pencil depends on what we do with the time it buys us.
Paul Erdős, who lived on amphetamines and coffee and slept four hours a night, once said that a mathematician is a machine for turning coffee into theorems. He would have appreciated the irony: now we’ve built a machine that turns electricity into proofs. And it doesn’t even need the coffee.
Your one-stop shop for automation insights and news on artificial intelligence is EngineAi.
Did you like this article? Check out more of our knowledgeable resources:
Watch this space for weekly updates on digital transformation, process automation, and machine learning. Let us assist you in bringing the future into your company right now