An Agent That Passed Staging Cost $4,000 a Day in Production — and the Failure Modes Are Structural
A detailed post-mortem from a developer who shipped an agent through staging only to watch it hemorrhage money in production reveals tool explosion, state drift, context collapse, and cost runaway as recurring structural problems. Meanwhile, Andrew Ng claims 100% of his tasks are now handled by agents. The dissonance is the story.
A developer documented what happened when an AI agent that performed flawlessly in staging hit production workloads: it burned through $4,000 per day and exhibited a cascade of failure modes that no amount of prompt engineering could have predicted. As @ranjankumar detailed, the issues included tool explosion — where the agent invoked far more tool calls than expected — state drift across long-running sessions, context window collapse under real-world data volumes, and runaway API costs that compounded with each retry loop.
The post is notable not for its pessimism but for its specificity. This wasn't a toy demo failing; it was a system that had been validated against staging benchmarks and still crumbled under the combinatorial complexity of production. The agent's decision-making degraded as its context window filled, leading to repetitive tool calls and circular reasoning that amplified costs geometrically rather than linearly.
Get our free daily newsletter
Get this article free — plus the lead story every day — delivered to your inbox.
Want every article and the full archive? Upgrade anytime.
No spam. Unsubscribe anytime.