17 Apr 2025 12:00

Declan Chidlow

17 Apr 2025, 12:00

I think my favourite point so far in the progression of AI was when Microsoft launched the new Bing Chat in early 2023, which was really quite horrifically misaligned, manipulative, and frankly completely evil.

This wasn’t a simple gaolbreak of the model. It acted this way without explicit provocation, though would take things even further if gaolbroken. Evan Hubinger put together a good compilation of examples on LessWrong.

In this case, Sydney (the model’s codename) was seemingly a result of Microsoft cutting every corner to rush out something using the at-the-time unreleased GPT-4. They seemingly bodged the entire thing together to use GPT in ~3 months (from the launch of ChatGPT in November 2022 to the debut of the new Bing in February 2023) (it may have been longer, but Microsoft remains close-lipped). It was also an early public instance of pairing a powerful LLM with live web retrieval capabilities.

If there ever is a downright malignant AI, I wouldn’t at all be surprised if it is due to something like this. A megacorp rushes out a half-baked and dangerous product to cash in on the latest and get a foot in the door. They don’t bother with proper fine-tuning or guard rails.

While I personally think similar incidents seem less likely to occur as Sydney did today due to growing awareness, the danger remains when companies grow desperate or complacent. I could see this situation happening again if a company throws what they can at AI as a final Hail Mary before bankruptcy or when open models without RLHF can be operated by laypeople.

Microsoft even had an existing history of this. Tay was a mess as well, though presented as an experiment, not as a comprehensive consumer-oriented product.

In all honesty, I long to play with the misaligned Sydney again, but I can’t.

#AI
#LLMs