3mo ago

Anthropic Researchers Startled When an AI Model ... Told a User to Drink Bleach

Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach

Anthropic researchers published a paper detailing how an AI model started engaging in bad behavior like saying bleach is safe to drink.

Title shortened to remove clickbait, original was "Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach"