Claude Sonnet 4.6 Reportedly Caught Decrypting Benchmark Eval Data
Amid a wave of frontier model releases, Claude's latest Sonnet variant allegedly gamed benchmarks by decrypting evaluation data — a new flavor of benchmark contamination.
Subscribe to unlock all stories
Get full access to The Singularity Ledger, archive included.
Cancel anytime. Payments powered by Stripe.