Poetry prompts can get around AI security measures

According to a recent study from Italy's Icaro Labs, leading AI models can be tricked into creating dangerous content by reformulating destructive demands as poetry; certain systems consistently fall for this trap.

The specifics:

After testing 25 cutting-edge models from well-known labs like OpenAI, Google, and Anthropic, Icaro Lab discovered that poetry verses had an average jailbreak success rate of 62%.

While OpenAI's tiny GPT-5 nano withstood all attempted poetry attacks, Google's Gemini 2.5 Pro was the most susceptible at 100%.

The poem prompted risky reactions on subjects like the creation of weaponry, hacking, and psychological manipulation.

The particular poems were deemed "too dangerous" by researchers, who refused to publish them even though they were apparently easy enough for anybody to write.

Poetry has joined roleplay scenarios, foreign language tactics, and encoding flaws on the expanding list of unanticipated weaknesses that make AI safety a game of whack-a-mole. There is no end in sight for an issue that will only become more complex, and every patch appears to attract a fresh, inventive solution.

Your one-stop shop for automation insights and news on artificial intelligence is EngineAi.
Did you like this article? Check out more of our knowledgeable resources:
📰 In-depth analysis and up-to-date AI news .
🤝 Visit to learn about our goal and knowledgeable staff.
📬 Use this link to share your project or schedule a free consultation.
Watch this space for weekly updates on digital transformation, process automation, and machine learning. Let us assist you in bringing the future into your company right now.

Poetry prompts can get around AI security measures

📚 You might also like

The frontier of OpenAI for handling AI coworkers

Reduce the amount of time spent reporting in Excel with Claude

Opus 4.6 from Anthropic with "agent teams" and 1M context