ARC-AGI-3 Benchmark Drops and Annihilates Every Frontier Model — Grok Scores Literally Zero

The newest version of the abstract reasoning benchmark designed to test genuine intelligence gives top LLMs less than 1% accuracy while humans score 100%. Grok 4.20 scored 0.00%.

Subscribe to unlock all stories

Get full access to The Singularity Ledger, archive included.

Cancel anytime. Payments powered by Stripe.