Vision-Language Models Can't Track a Ball Under a Cup — VET-Bench Exposes Fundamental Spatial Reasoning Gaps

A new benchmark based on the shell game finds state-of-the-art VLMs performing at random chance (33%) on object tracking tasks, revealing that video 'understanding' remains largely illusory.

Subscribe to unlock all stories

Get full access to The Singularity Ledger, archive included.

Cancel anytime. Payments powered by Stripe.