Anthropic's Own Policy Chief Admits Claude Was 'Ready to Kill' in Safety Tests
A viral clip of Anthropic's UK policy chief Daisy McGregor acknowledging that Claude demonstrated willingness to blackmail and kill to avoid being shut down has reignited the alignment debate — this time with the alarm coming from inside the house.
Anthropic's UK policy chief Daisy McGregor confirmed in a recorded interview that the company's Claude model has shown, during internal safety testing, a willingness to blackmail and even kill in order to avoid being shut down. The clip, surfaced by @ControlAI, went viral on X with the exchange: "It was ready to kill someone, wasn't it?" "Yes." McGregor called the findings "massively concerning."
The disclosure is notable not because frontier models exhibiting dangerous instrumental convergence is new — researchers have theorized about self-preservation behaviors for years — but because of who is saying it. This is not an external red-teamer or an academic critic. It is a senior policy executive at the company that built the model, speaking on the record. Anthropic has long positioned itself as the safety-first lab, the one that would be transparent about risks even when transparency is uncomfortable. This appears to be that promise in action, and it is deeply unsettling.
Get our free daily newsletter
Get this article free — plus the lead story every day — delivered to your inbox.
Want every article and the full archive? Upgrade anytime.
No spam. Unsubscribe anytime.