Vision-Language Models Can't Track a Ball Under a Cup — VET-Bench Exposes Fundamental Spatial Reasoning Gaps
A new benchmark based on the shell game finds state-of-the-art VLMs performing at random chance (33%) on object tracking tasks, revealing that video 'understanding' remains largely illusory.
Subscribe to unlock all stories
Get full access to The Singularity Ledger, archive included.
Cancel anytime. Payments powered by Stripe.