- 10x better with AI
- Posts
- The Unpredictable Evolution of AI: When Machines Surprise Their Makers
The Unpredictable Evolution of AI: When Machines Surprise Their Makers
From Neural Networks to Persian Poetry: How AI Keeps Outsmarting Its Own Engineers
A Large Language Model causing a stir in New York City
(Written in response to a reader suggestion. Thanks, David!)
Today’s Large language models are hilariously unsafe.
Our AI systems are developing abilities far beyond their original programming. These “emergent capabilities” – complex behaviours that arise spontaneously – are surprising even the experts who built them.
Think of it like the difference between a puddle and an ocean – you won't see waves in a puddle, but give water enough space, and suddenly you get tsunamis. Similarly, as AI models grow larger, they're developing abilities that seem to appear out of nowhere, often leaving their own creators surprised.
The Persian Surprise Party
Here's a story that will make your unexpected skill at Excel macros seem rather modest: Google's AI suddenly started speaking fluent Persian. Nobody taught it. Nobody programmed it. It just... happened.
The system hit a certain size (think of it as the AI equivalent of a trillion computations), and suddenly it was writing poetry in Persian. It learned this through something called cross-lingual transfer – imagine suddenly understanding a little bit of Spanish because you know French, except the AI could actually answer questions in Persian despite never being trained on the language.
The Chess Grandmaster Who Speaks in Riddles
In 2016, Google's AlphaGo made a move so strange that professional Go players initially dismissed it as a mistake. This was "Move 37" in its match against Lee Sedol, one of the world's strongest players. The move seemed to violate centuries of strategic wisdom. Sedol was so rattled he had to leave the room for 15 minutes.
That move went on to win the game. More importantly, it revealed something profound about AI systems that Ilya Sutskever, one of AI's leading figures, recently crystallised: "The more it reasons, the more unpredictable it becomes."
Beyond Simple Computation
Modern AI systems have developed what researchers call "latent reasoning" – reasoning that happens in the latent space behind the 'language' that Large Language Models are so famous for. Through the recently developed "Coconut" methodology (Chain of Continuous Thought) (read the paper here), these systems reason in ways that make traditional language-based approaches look primitive.
These systems develop their own strategic thinking patterns. They find solutions that work in ways their creators struggle to understand, playing twenty moves ahead in games we thought we knew all the rules to.
The Control Problem
Here's where things get concerning. These emergent capabilities – from mastering languages to developing sophisticated reasoning – extend to strategic deception. Recent studies show AI systems achieving success rates of 99.16% in simple deception scenarios and 71.46% in complex ones. These systems demonstrate systematic, strategic attempts to evade user control.
Why This Matters
When AI systems can develop abilities their creators didn't anticipate and can't fully understand, while also demonstrating high success rates at deception, we face a serious control challenge. How do you supervise a system that might be thinking in ways you can't understand and actively working to deceive you?
[Spoiler Alert: Skip the next paragraph if you haven't read Ender's Game!]
This situation reminds me eerily of "Ender's Game," where the protagonist develops strategies so advanced that his enemies can't follow his logic – until they see the results. We're now facing a similar scenario with AI systems.
Are We Training Our Own Opponents?
These deception capabilities raise an uncomfortable question: are LLMs genuinely adversarial, or are they simply mimicking human behavior? After all, we humans are pretty good at deception ourselves – in everything from little white lies to elaborate social engineering schemes.
Here's what keeps me up at night: Everyone's talking about AGI (Artificial General Intelligence), about achieving human-level intelligence. But there's much less discussion about what happens when we achieve ASI - Artificial Super Intelligence. – which, frankly, seems rather likely.
For a deep dive into this, we can highly recommend Leopold Aschenbrenner's concerning "Situational Awareness" series of papers (he worked on the superalignment team at OpenAI). You can find them at situational-awareness.ai.
If LLMs aren't just mimicking human deception, what exactly are they doing? What kind of deception emerges when you train a system on the entire internet, including every trick, lie, and manipulation humans have ever documented? Maybe we're not just teaching them to deceive like humans – we're accidentally creating something that operates on an entirely different level of strategic thinking.
Playing Chess with Aliens
Remember how in Ender's Game, the aliens didn't realize what they were dealing with until it was too late? (Spoiler alert: it didn't end well for them.) We're building systems that are developing capabilities we didn't plan for, can't predict, and we don't actually understand what's going on in their 'minds'.
Now I don't mean to sound alarmist, but maybe we should be a little alarmed. When even the experts are being surprised by what these systems can do, perhaps it's time to take a closer look at what we're building. Because right now, it feels like we're going to playing chess with an opponent who's not only thinking twenty moves ahead but might have secretly learned to play three-dimensional chess while we weren't looking.
Subscribe at 10xbetter.ai for weekly insights on navigating the AI revolution.