Mark and Priscilla’s Biohub Just Dropped a Protein Engine. Cancer and Immune Disease Don’t Stand a Chance.
Here’s a sentence I did not expect to write this year: we now have an open-source engine that maps, predicts, and designs proteins, and it’s already working on cancer targets in the lab.
Not “it might work someday.” Not “the paper shows promise in silico.” It’s already designing binders. In the lab. Against five different cancer and immune targets. With hit rates—the percentage of designed molecules that actually work—between 36% and 88%.
Those numbers are not normal. In traditional drug discovery, hit rates in the low single digits are considered acceptable. A 10% hit rate is a triumph. Biohub is claiming 36% on the low end, 88% on the high end. That’s not incremental improvement. That’s a different game entirely.
The work comes from the Chan Zuckerberg Biohub, the research organization funded by Mark Zuckerberg and Priscilla Chan. And before you roll your eyes at another billionaire science project, read the details. This is not vanity philanthropy. This is infrastructure. And they just open-sourced it.
The Three-Piece Suite: ESMC, ESMFold2, and ESM Atlas
The Biohub released not one thing but three interconnected tools. Together, they form what they’re calling Evolutionary Scale Models—a reference to the idea that evolution has already done billions of experiments on proteins, and we just need to learn how to read the results.
ESMC is the foundation. It’s a protein language model trained on 2.8 billion protein sequences—the largest such dataset ever assembled. Like a large language model learns grammar and meaning from text, ESMC learns the “grammar” of proteins: which amino acids go together, which structures are stable, which mutations break things and which ones don’t.
ESMFold2 is the structure predictor. You give it a protein sequence, it predicts the three-dimensional shape. This is the same problem AlphaFold famously solved, but ESMFold2 claims to go further. It predicts not just single protein structures but protein-protein interactions and—critically for drug development—antibody-antigen binding. That’s the problem of designing a drug that sticks to a disease target. And ESMFold2, according to Biohub, outperforms AlphaFold on exactly that task.
ESM Atlas is the map. It contains 6.8 billion protein sequences and 1.1 billion predicted structures, organized to surface evolutionary relationships that were previously invisible. Think Google Maps for the protein universe. You can zoom in on a single protein, zoom out to see its relatives across species, or search for entirely new proteins that might do something useful.
Three tools. One open-source release. No paywall. No licensing fees. No “contact us for enterprise access.” Just the code, the weights, and a paper.
How It Works (In Plain English)
Proteins are chains of amino acids that fold into specific shapes. That shape determines what the protein does. Change the shape, change the function. Misfold a protein, and you get disease.
For decades, the problem was inverse: given a sequence, can you predict the shape? AlphaFold solved that brilliantly. But the harder problem—the one that actually matters for drug discovery—is the design problem: given a target shape (say, a cancer protein you want to disable), can you design a new protein that sticks to it?
That’s what ESMFold2 is optimized for. It doesn’t just predict structures. It predicts interactions. And because it’s built on ESMC’s deep understanding of protein “grammar,” it can generate candidate binder sequences that evolution never got around to testing.
The Biohub team then takes those candidate sequences and synthesizes them in the lab. They test whether the designed proteins actually bind to the target. And in their initial validation against five cancer and immune targets, between 36% and 88% of the designed binders worked on the first try.
Let me repeat that. Thirty-six to eighty-eight percent. On the first try.
Traditional protein design—using physics-based simulations and brute-force screening—has success rates in the low single digits. Machine learning approaches have been creeping up into the 5–10% range. Biohub is claiming an order of magnitude improvement. If that holds up in independent replication, it’s a revolution.
The Lab Results: Cancer and Immune Disease First
The Biohub didn’t just release models and say “trust us.” They did the wet-lab validation and published the results.
The five targets were carefully chosen. Some are cancer-associated proteins involved in cell growth and division. Others are immune system targets relevant to autoimmune disease and immunotherapy resistance. They’re not easy targets. If they were easy, someone would have drugged them already.
For each target, ESMFold2 generated thousands of candidate binder designs. The Biohub team synthesized a subset and tested them experimentally. The lowest hit rate across the five targets was 36%. The highest was 88%.
Those numbers mean that for some targets, nearly nine out of ten designed proteins worked as intended on the first attempt. That’s not design. That’s engineering. That’s the difference between throwing darts in the dark and having a laser sight.
The Biohub has not yet published the full list of targets—some are part of ongoing therapeutic programs—but they’ve confirmed that binders against multiple oncology and immunology targets are now moving into functional assays. The next step is testing whether the binders actually affect disease processes in cells and animal models.
Open Science Vs. Proprietary Pharma
Here’s where the Biohub model diverges from almost everyone else.
Isomorphic Labs, Google DeepMind’s drug discovery spinout, is doing similar work. They have AlphaFold, they have massive compute, and they have a direct line to Demis Hassabis’s AGI ambitions. But Isomorphic is a company. It will develop drugs and sell them. That’s fine. That’s how the system works.
Biohub is different. It’s a nonprofit research institute. Its mandate is to release what it builds. The models are open-source. The atlas is public. The methods are published. Anyone—academic lab, small biotech, even a competitor—can use them.
This is a deliberate choice. Zuckerberg and Chan have committed billions to the Biohub network with the explicit goal of accelerating science by removing friction. No IP hoarding. No licensing negotiations. No “we’ll share the data after we file the patents.”
The risk is that someone else takes their open-source tools, develops a blockbuster drug, and makes billions while Biohub gets credit but no royalties. The Biohub leadership is fine with that. That’s the point. They want drugs to exist, not necessarily to sell them.
The counterargument is that without the profit motive, the work might not get done at the scale required. Pharma companies spend hundreds of millions on a single drug candidate. Biohub’s budget, while large ($500 million for the Virtual Biology Initiative), is small compared to Pfizer’s R&D spend. The open model relies on the broader scientific community to pick up the ball and run with it.
Whether that works remains to be seen. But the early results suggest the model is at least generating the ball.
The AlphaFold Comparison (Because Everyone Will Ask)
Let me address the elephant in the room. Biohub claims ESMFold2 outperforms AlphaFold. That’s a bold claim. AlphaFold is the most cited AI model in biology history. It won the CASP competition. It’s been used in hundreds of thousands of papers.
Here’s the nuance.
AlphaFold is unbeatable on single-structure prediction. If you give it a protein sequence and ask for the folded shape, AlphaFold is still the gold standard. ESMFold2 is not claiming to beat AlphaFold there.
But AlphaFold was not designed for protein-protein interactions. It can do them—people have adapted it—but it’s not optimized for that task. ESMFold2 was built from the ground up to predict interactions, especially antibody-antigen binding. That’s the drug discovery problem. And on that specific task, Biohub’s benchmarks show ESMFold2 ahead.
So the headline is not “ESMFold2 beats AlphaFold at everything.” The headline is “ESMFold2 beats AlphaFold at the thing that matters most for designing drugs.” That’s a meaningful distinction, and it’s why people are paying attention.
The ESM Atlas: Google Maps for the Protein Universe
The third piece of the release—ESM Atlas—might end up being the most important long term.
The atlas contains 6.8 billion protein sequences and 1.1 billion predicted structures. That’s not every protein ever discovered. That’s every protein in the databases, plus predictions for every sequence that evolution has produced but we haven’t yet characterized.
More importantly, the atlas organizes these sequences and structures by evolutionary relationships. You can search for a protein in one organism and see its relatives across the tree of life. You can find distant homologs that perform similar functions in completely different contexts. You can discover that a protein involved in bacterial metabolism has a structural cousin in a human cancer pathway—a connection no one would have seen without the atlas.
This is the “virtual biology” part of Biohub’s Virtual Biology Initiative. They are building a computational mirror of biological reality. The atlas is the reference map. The models are the navigation tools. And the lab validation is the ground truth.
When all three work together, you get a flywheel. The models make predictions. The lab tests them. The results improve the models. The improved models make better predictions. Repeat.
The Hassabis Connection
The article mentions Demis Hassabis’s vision of AI ending all disease. It’s not an accident.
Hassabis has said, repeatedly, that he believes AI will eventually give us an “engine that could help cure any disease.” Not one disease. Not a few diseases. Any disease. Because at the molecular level, most diseases are protein problems. Misfolded proteins. Overactive proteins. Missing proteins. Proteins that bind to the wrong things or fail to bind to the right things.
If you have a tool that can design proteins on demand—binders, blockers, enzymes, sensors—you have a universal therapeutic platform. Cancer? Design a binder that flags tumor cells for destruction. Autoimmune disease? Design a blocker that quiets the overactive immune cells. Viral infection? Design a decoy that the virus binds to instead of human cells.
This is not science fiction. This is protein engineering. And Biohub’s release moves us significantly closer to that universal platform.
Hassabis’s Isomorphic Labs is working the same problem from a different angle. They’re more proprietary, more secretive, and probably better funded. But the goal is the same. And the friendly competition between the two approaches—open vs. closed, structural vs. evolutionary—is likely to accelerate both.
What’s Missing (And What Comes Next)
The Biohub release is extraordinary. But it’s not finished.
The lab validation so far is on binding. Does the designed protein stick to the target? That’s necessary but not sufficient. The binder also needs to be stable in the body, non-toxic, manufacturable, and capable of reaching the target tissue. Those are harder problems, and ESMFold2 doesn’t solve them yet.
The Biohub team knows this. Their next phase will focus on functional assays: testing whether the binders actually change disease outcomes in cells and animals. They’ve already started. The 36–88% hit rates are for binding. Functional hit rates will be lower. That’s normal.
The other missing piece is delivery. Even the perfect protein is useless if you can’t get it to the right place in the body. Biohub is not a drug delivery company. They’re counting on partners—academic labs, biotechs, pharma—to handle that part. Whether the open ecosystem can match the delivery expertise of a dedicated company is an open question.
Why It Matters (Beyond the Science)
The Biohub’s release is significant for three reasons that have nothing to do with protein folding.
First, it’s open. In an era where frontier AI models are increasingly locked behind APIs and paywalls, Biohub released the weights. You can download ESMFold2 and run it on your own hardware. That’s a statement about how science should work.
Second, it’s validated. The models aren’t just paper claims. The Biohub did the lab work. The 36–88% numbers are real experimental results, not simulations. That’s rare. Most AI-for-biology papers stop at computational benchmarks. Biohub went the extra mile.
Third, it’s focused on targets that matter. Cancer and immune disease. Not model organisms. Not academic curiosities. Real human diseases that kill millions. The Biohub chose these targets deliberately. They are not doing basic science for its own sake. They are trying to cure things.
The Bottom Line
Mark Zuckerberg and Priscilla Chan have committed over $500 million to the Virtual Biology Initiative. This release is the first major public return on that investment. It is, by any measure, a success.
The models are state-of-the-art. The atlas is unprecedented. The lab validation is solid. And the whole thing is open-source, available to any researcher anywhere.
We are not at “AI cures all disease” yet. That’s still years away, maybe decades. But we are closer than we were last week. And the path from here to there is becoming visible.
ESMFold2 is not the final answer. But it is a very good answer to a very hard question. And for the millions of people waiting on better treatments for cancer and immune disease, that is not nothing. That is everything.
Your one-stop shop for automation insights and news on artificial intelligence is EngineAi.
Did you like this article? Check out more of our knowledgeable resources:
Watch this space for weekly updates on digital transformation, process automation, and machine learning. Let us assist you in bringing the future into your company right now