Poetry prompts can get around AI security measures

By EngineAI Team | Published on December 11, 2025
Poetry prompts can get around AI security measures
According to a recent study from Italy's Icaro Labs, leading AI models can be tricked into creating dangerous content by reformulating destructive demands as poetry; certain systems consistently fall for this trap.

The specifics:

After testing 25 cutting-edge models from well-known labs like OpenAI, Google, and Anthropic, Icaro Lab discovered that poetry verses had an average jailbreak success rate of 62%.

While OpenAI's tiny GPT-5 nano withstood all attempted poetry attacks, Google's Gemini 2.5 Pro was the most susceptible at 100%.

The poem prompted risky reactions on subjects like the creation of weaponry, hacking, and psychological manipulation.

The particular poems were deemed "too dangerous" by researchers, who refused to publish them even though they were apparently easy enough for anybody to write.

Poetry has joined roleplay scenarios, foreign language tactics, and encoding flaws on the expanding list of unanticipated weaknesses that make AI safety a game of whack-a-mole. There is no end in sight for an issue that will only become more complex, and every patch appears to attract a fresh, inventive solution.

🔗 External Resource:
Visit Link →