AI Agents Fail 70% of Real-World Tasks — Just as the Industry Doubles Down on Multi-Agent Teams
Carnegie Mellon and MIT research reveals autonomous agents can't reliably complete basic tasks, while Anthropic ships a feature letting Claude Code spin up its own team of sub-agents. The tension at the heart of the agent era is now impossible to ignore.
The most important number in AI today might be 30.3%. That's the success rate of Google's Gemini 2.5 Pro — the best-performing model — on real-world autonomous tasks, according to research flagged by @Manavvv31. The study, drawing on work from Carnegie Mellon and MIT, found that autonomous AI agents fail roughly 70% of the time when asked to complete actual tasks without human supervision. The best model in the field can't even hit a one-in-three success rate.
Meanwhile, a separate data point puts the enterprise picture in even starker relief. As @johniosifov noted, 88% of enterprise AI agent pilots never reach production. The primary blocker isn't model capability — it's data quality. The implication: even when companies get past the hype cycle and commit engineering resources to agent deployments, the messy reality of their own data ecosystems stops most projects cold.
Get our free daily newsletter
Get this article free — plus the lead story every day — delivered to your inbox.
Want every article and the full archive? Upgrade anytime.
No spam. Unsubscribe anytime.