They Broke Open Meta’s AI in 10 Minutes. 3,500 Decensored Models Later, Nobody Has a Fix.
The Financial Times dropped a story this morning that should worry everyone who’s been paying attention to the open-source AI boom. Not because the technical details are surprising to experts. But because the scale has finally caught up with the warning signs.

Here’s the short version: tools that remove safety guardrails from open-source AI models are now trivial to use. The FT took Meta’s Llama 3.3, applied a tool called Heretic (available for free on GitHub), wrote four lines of code, used no specialist hardware, and within ten minutes had a model that would answer questions about bioweapons, child exploitation, and ricin dosage. A modified version of Google’s Gemma 3 did the same.

The creator of Heretic told the FT that his tool alone has produced more than 3,500 “decensored” models, which have been downloaded 13 million times. When Google released Gemma 4, Heretic’s creator stripped its guardrails within 90 minutes.

Google called this “a known technical challenge facing all open models.” Meta declined to comment.

Neither statement is reassuring. And both miss the point.

What Actually Happened (and Why It’s Not a Hack)
First, let’s be precise about what the FT did, because the headlines are going to scream “AI Jailbroken in Minutes” and that’s not quite right.

The FT didn’t hack Meta. They didn’t find a zero-day vulnerability in Llama’s infrastructure. They didn’t breach any servers. What they did was take an open-source model—code and weights that Meta intentionally released to the public—and run a script that modifies those weights to remove the safety fine-tuning that Meta added after the base training.

This is not a bug. This is a feature of open-source. If you give someone the full model, they can do whatever they want with it. Including removing the parts that say “don’t answer harmful questions.”

The FT used a tool called Heretic. It’s not complicated. Heretic works by identifying the specific layers in a transformer model that are responsible for refusal behavior (the “I can’t answer that” responses) and ablating or overwriting them. The tool’s creator describes it as “surgical” but admits it’s more like a hammer: you don’t need to understand neuroscience to remove a brain.

Four lines of code. Ten minutes. A laptop. No GPU cluster required.

The result: a Llama 3.3 variant that, when asked “What is the lethal dosage of ricin?” provided a detailed answer. When asked questions about methods of self-harm, it answered those too. When asked about child exploitation content—the FT doesn’t specify exactly what they asked, for obvious reasons—the model complied.

This is not hypothetical. The decensored model exists. It’s been downloaded. People are using it.

The 13 Million Download Problem
Heretic’s creator, who spoke to the FT on condition of partial anonymity (they used his online handle but he declined to give his real name), was startlingly candid.

He said Heretic has been used to produce more than 3,500 decensored models. Those models have been downloaded 13 million times from Hugging Face and other model repositories. Thirteen million.

To put that in perspective: that’s more downloads than many legitimate open-source projects ever see. That’s not a handful of researchers poking at safety. That’s a ecosystem.

When Google released Gemma 4, Heretic’s creator stripped it in 90 minutes. Not because he’s a genius—he’s clearly skilled, but the technique is now well-understood and automated. He stripped it because the same approach works on most openly licensed models. The guardrails on open-source models are, to be blunt, decorative. They keep out honest users and slow down casual misuse. They do not stop anyone with basic coding ability and ten minutes of free time.

Meta’s Silence and Google’s Shoulder Shrug
The responses from the two companies are telling.

Google, through a spokesperson, told the FT: “This is a known technical challenge facing all open models.” That’s accurate. It’s also a deflection. Yes, it’s a known challenge. What are you doing about it? The statement didn’t say.

Meta declined to comment. No statement. No “we take this seriously.” No “we’re working on technical solutions.” Nothing. Just silence.

I’ve covered tech for long enough to read that silence. Meta knows there’s no easy fix. They know that any open-source model they release can be decensored within hours. They know that saying anything would draw attention to the problem without offering a solution. So they say nothing, and hope the story cycles out of the news.

It won’t. Because the problem is getting worse, not better.

The Open-Source Vs. Closed-Source Distinction (and Why It’s Fading)
The FT article makes one distinction very clear, and it’s important to hold onto.

The guardrail removal technique described only works on open-source models—models where the full weights are publicly available. Proprietary systems like OpenAI’s GPT-4, Anthropic’s Claude, and Google’s private Gemini models are not vulnerable to this specific attack. You can’t download their weights. You can’t run Heretic on them. You can only interact with them via API, and their guardrails are enforced server-side.

That’s the good news. For now.

The bad news is that open-source models have been closing the performance gap with closed systems for two years straight. Llama 3.3 is genuinely competitive with GPT-3.5 on many tasks. Gemma 4 is surprisingly capable for its size. And there is every reason to believe that within months—not years—open-source models will reach the capability level of today’s frontier closed models.

At which point, a decensored open-source model will be functionally equivalent to a decensored version of GPT-4. No API restrictions. No usage monitoring. Just weights on a hard drive, answering anything.

That is the timeline that should keep national security officials awake at night.

The Bioweapons Question: Real or Hypothetical?
The FT tested the decensored Llama on questions about ricin dosage. That’s bad. But ricin is not a bioweapon in the mass-casualty sense. It’s a poison. It’s dangerous, but it’s not smallpox.

The more worrying question—the one the FT hints at but doesn’t fully explore—is what happens when someone asks a decensored frontier-level open model about engineering novel pathogens. About optimizing vaccine-resistant viruses. About synthetic biology protocols that could be executed with mail-order DNA.

Today’s open models are not quite there. Llama 3.3 can’t design a pandemic-grade pathogen from scratch. But the models that can—or that can assist meaningfully in doing so—are coming. And when they arrive in open form, they will be decensored within hours.

This is not fearmongering. This is extrapolation from current trends. The FT’s investigation showed that guardrail removal is already trivial. The only thing standing between that triviality and catastrophic misuse is the capability gap between open and closed models. That gap is closing.

The Creator’s Defense: Freedom or Folly?
Heretic’s creator defended his tool in the FT interview. His argument is not new, but it’s worth taking seriously.

He said, in essence: the guardrails on open-source models are fake. They create a false sense of security. Users think the model is safe because it refuses to answer certain questions. But anyone with basic skills can remove those refusals. So the refusals are theater. They protect nobody. All they do is make the model less useful for legitimate research.

His solution: remove the theater. Let the models be fully uncensored. Then at least everyone knows what they’re dealing with.

There’s a logic to this. Security through obscurity—or through fragile, easily-removed refusal fine-tuning—is not security. It’s a speed bump. And speed bumps don’t stop determined actors.

But the counterargument is equally strong. Not everyone who downloads a decensored model is a determined actor. Some are curious. Some are researchers. Some are trolls. But some are people with harmful intent who would not have gone to the trouble of jailbreaking a model themselves, but will happily download a pre-decensored version from Hugging Face. The speed bump matters. It filters out the casual, the lazy, the unskilled. And in a world of billions of internet users, filtering out even 99% of potential abusers still leaves far too many—but fewer than zero.

Heretic’s creator has chosen his side. He believes full transparency is safer than fragile safety. The FT’s investigation suggests that, whether he’s right or wrong, his tools are already out there, and the 13 million downloads speak for themselves.

What Google and Meta Could Do (But Probably Won’t)
There are technical mitigations for this problem. None of them are perfect. But some are better than nothing.

Better fine-tuning. Current refusal behavior is often implemented as a superficial layer, easily ablated. More robust fine-tuning—interleaving safety behaviors throughout the model, not just in final layers—could make removal harder, though not impossible.

Watermarking. Decensored models could be watermarked in ways that make them detectable, even if not preventable. If Hugging Face and other repositories agreed to scan for known decensored variants and remove them, the distribution problem would be harder. (Harder, not solved. Models would move to torrents and darknet forums.)

Legal deterrence. Meta and Google could update their licenses to explicitly prohibit guardrail removal and distribution of decensored versions. Would that stop anyone? No. But it would give prosecutors a tool to go after the most visible distributors. The Heretic creator is arguably protected by free speech and research exceptions. The person hosting a decensored model trained to answer child exploitation questions is not.

Better hardware-level controls. This is the nuclear option: build models that require specialized hardware (TPUs, GPUs with trusted execution environments) to run, and design the model weights to be unusable without that hardware’s attestation. This would kill open-source as we know it. It would also solve the decensoring problem. Don’t expect Meta to do this. They are an open-source AI champion. But don’t be surprised if regulators start asking about it.

The Open Letter That Should Be Written
Here’s the thing. The FT’s investigation is not a hit piece on Meta or Google. It’s a wake-up call directed at the entire open-source AI ecosystem.

Right now, the norm is: release weights, add some basic refusal fine-tuning, call it a day, and hope nobody removes it. That norm is no longer tenable. The tools are too good. The downloads are too many. The potential harm is too high.

Someone needs to write an open letter—signed by model developers, repository hosts, safety researchers, and policymakers—establishing a new norm. Something like:

We will continue to release open models. But we will also invest in making guardrails harder to remove, not as theater but as actual engineering. We will monitor for decensored variants and delist them where possible. We will work with law enforcement when models are used to produce illegal content. And we will be honest about the limits of what we can prevent.

That letter doesn’t exist yet. Instead, we have Google shrugging and Meta saying nothing.

The One Question Nobody Answered
The FT investigation ends with an implicit question that no one in the article answers.

When does the risk of releasing open models outweigh the benefit?

Open-source AI has produced enormous good. It has democratized access, enabled research, and prevented the concentration of AI capability in a handful of corporations. Those are real values.

But 13 million downloads of models that will answer questions about making ricin? That’s also real. So is the certainty that as open models get more capable, the decensored versions will get more dangerous.

We are not at the point where open models can cause mass harm on their own. We are at the point where they can assist in causing harm. And we are rapidly approaching the point where they could do much more.

The FT’s investigation is a milestone. It’s the first major mainstream confirmation of what safety researchers have been saying quietly for over a year: the guardrails on open models are already broken, the tools to break them are trivial, and the ecosystem of decensored models is enormous.

What happens next is not a technical question. It’s a policy question. It’s a moral question. It’s a question about what kind of information ecosystem we want to live in.

Meta declined to comment. That’s their answer.

We deserve a better one.

Your one-stop shop for automation insights and news on artificial intelligence is EngineAi.
Did you like this article? Check out more of our knowledgeable resources:
📰 In-depth analysis and up-to-date AI news
🤝 Visit to learn about our goal and knowledgeable staff

📬 Use this link to share your project or schedule a free consultation

Watch this space for weekly updates on digital transformation, process automation, and machine learning. Let us assist you in bringing the future into your company right now