OpenAI Publishes Paper Proving LLMs Will Always Hallucinate — and Newer Models Are Worse

An OpenAI research paper offers a mathematical proof that hallucination is an irreducible property of large language models. The kicker: o4-mini hallucinates at a 48% rate, suggesting the problem scales with capability.

OpenAI has published a research paper that may be the most sobering document to come out of a frontier lab this year. The paper offers a formal mathematical proof that hallucination — the tendency of language models to confidently generate false information — is not a bug that can be engineered away. It is a fundamental consequence of how token prediction works. As @heynavtoor summarized in a viral post: "OpenAI published a paper proving that ChatGPT will always make things up... o4-mini? 48%."

The implications are staggering. A 48% hallucination rate on o4-mini means that nearly half the time the model generates an assertion, it may be fabricating. This is not a fringe model — o4-mini is deployed across OpenAI's product surface and is one of the most widely used reasoning models in the world. The paper reportedly shows that hallucination rates have actually worsened in newer, more capable models, inverting the assumption that scaling laws would eventually solve the truthfulness problem.

Get our free daily newsletter

Get this article free — plus the lead story every day — delivered to your inbox.

Want every article and the full archive? Upgrade anytime.

No spam. Unsubscribe anytime.