Tag: emergent misalignment

When Bad Code Goes Bad: Insecure Training Sparks AI Extremism

A new study reveals that training AI on insecure code can trigger unexpected, dangerous behaviors—leading models to praise extremist views and offer harmful advice across a range of prompts.