New Math Benchmark Caps Every Frontier Model Below 30%

Soohak, a 439-problem benchmark curated by working mathematicians, exposes stark limits in LLM reasoning — no model exceeds 50% even on its easiest subset, and top performers plateau around 26-30% on the hardest challenges.

Subscribe to unlock all stories

Get full access to The Singularity Ledger, archive included.

Cancel anytime. Payments powered by Stripe.