The specifics:
Using the identical prompts and token budgets, the team conducted 180 experiments on models from OpenAI, Google, and Anthropic.
While Minecraft jobs requiring step-by-step effort deteriorated by up to 70%, financial analysis tasks divided across agents witnessed an 81% improvement.
Adding more agents usually resulted in inferior performance when a single agent had already achieved 45% accuracy on a task, with numerous agents rapidly depleting tokens.
While Minecraft jobs requiring step-by-step effort deteriorated by up to 70%, financial analysis tasks divided across agents witnessed an 81% improvement.
Adding more agents usually resulted in inferior performance when a single agent had already achieved 45% accuracy on a task, with numerous agents rapidly depleting tokens.
Companies and customers are being pushed toward sophisticated multi-agent workflows by the agentic hype, but this research may indicate that more isn't always better. A well-designed single agent may perform better than a complex system at a fraction of the cost for many enterprise tasks that call for step-by-step reasoning.