Multi-Turn Conversations Crash LLM Performance to 65%, Microsoft and Salesforce Paper Finds

A joint research paper tested leading LLMs in realistic multi-turn dialogues and found accuracy plummeted from 90% in single-turn to 65% — with compounding errors from forgotten context, false assumptions, and instruction drift.

Subscribe to unlock all stories

Get full access to The Singularity Ledger, archive included.

Cancel anytime. Payments powered by Stripe.