Poetiq's Gemini version surpasses ARC-AGI-2

Poetiq, a six-person AI startup, recently took first place on the ARC-AGI-2 reasoning benchmark, outperforming Google's Gemini 3 Deep Think at half the price by coordinating pre-existing models rather than creating its own.

The specifics:

Shortly after Gemini 3 premiered, Poetiq's meta-system achieved the top-ranked performance without retraining, adapting to new models in a matter of hours.

Poetiq's refining method outperformed Google's best version, Deep Think, at 54% and $77 per job while using Gemini 3 Pro as a basis.

Leading models were only able to reach 5% on ARC-AGI-2 six months prior, so this result represents the first system to go through the 50% barrier.

With an integrated self-auditing system to guarantee high-quality solutions, the startup's open-sourced methodology employs LLMs to continuously improve its own outputs.

The rapid progress is demonstrated by the ARC-AGI-2, which went from less than 5% to over 50% in just a few months. Poetiq's improvement portends a future in which AI advancements will come from two sources simultaneously: the creation of cutting-edge models and astute orchestration built on top of them by teams with modest computing resources.