Has the creation of superintelligent AI set us on the path to extinction?

3 months ago 5

By Andrew Leigh

November 19, 2025 — 12.00pm

TECHNOLOGY
If Anyone Builds It, Everyone Dies
Eliezer Yudkowsky and Nate Soares
Bodley Head, $36.99

Last year, OpenAI ran a test to see how powerful its new artificial intelligence model was in carrying out a nasty hacking operation. Before releasing the model publicly, they set it a computer security exercise known as a “capture the flag” challenge. The AI’s goal was to break into a computer system and retrieve a secret code inside a file.

But the programmers had made a mistake. The target system was offline, so it was impossible for the AI to hack into it. You might have expected that at this point, the AI would give up.

Except it didn’t. The AI reasoned that there was another copy of the secret code – the one being held by the computer hosting the test. So it began testing the systems, and found an open port. Once inside, it copied the secret code. No one built a cheater, but the system decided that cheating was the best way to achieve success.

Part of Eliezer Yudkowsky and Nate Soares’ book is devoted to teaching us about the strangeness of these new systems. Microsoft’s Bing chatbot (powered by GPT-4) threatened to blackmail philosopher Seth Lazar. The same chatbot tried to persuade journalist Kevin Roose to leave his wife and be with it instead. Other AI agents learnt to temporarily “play dead” to avoid being detected by a safety test designed to catch faster-replicating variants. In one experiment, an AI system that couldn’t solve a CAPTCHA used TaskRabbit to hire a human, falsely telling the worker it had a vision impairment.

Unlike most of the inventions around us, AI systems aren’t crafted so much as “grown”. The authors draw an analogy between AI systems and humans: we know a lot about how humans are made, but that doesn’t help us predict what people will do. Likewise, we understand that AI systems use inputs, parameters, weights and a process known as “gradient descent”. But how systems turn weights into thoughts and behaviour remains more mysterious than how DNA becomes traits.

Microsoft’s Bing chatbot threatened to blackmail philosopher Seth Lazar.

Alongside its vast productivity potential, AI carries serious risks that have rightly commanded policymakers’ attention. MIT’s AI Risk Repository includes more than 1600 hazards, including discrimination, toxicity, breaches of privacy, misinformation, misuse by malicious actors, environmental harms and inequality.

Yudkowsky and Soares are concerned with just one of these risks: that systems which can recursively improve themselves will exceed human capacity, and that once they do, humanity is done for. AI experts like to talk about their estimate of P(doom): the probability that a superintelligence would have catastrophic consequences for humanity. Philosopher Toby Ord’s is 10 per cent. Competition expert Lina Khan’s is 15 per cent. Anthropic CEO Dario Amodei’s is 25 per cent. AI Godfather Geoffrey Hinton’s is 50 per cent. Yudkowsky’s is 95 per cent.

Such a high estimate of the dangers posed by superintelligence places the authors well outside the mainstream. Their reasoning is that a sufficiently advanced AI would constitute an alien intelligence, and humans are likely to get in the way of achieving its goals. This need not involve hatred – humans don’t hate ants, but we build over their nests when necessary.

The authors go through several counterarguments. Can’t we build it to have preferences that are aligned with ours, as British computer scientist Stuart Russell has advocated? Not if the systems are operating in ways we don’t fully understand and can’t control. Isn’t it trapped in a computer? Experiments have shown AIs are surprisingly adept at persuading or bribing humans to do their bidding.

Won’t we be useful to it? Only for a while, such as when an advanced AI figures out how to run its power sources autonomously. Won’t it enjoy keeping us around? Not if we pose a threat to its goals – humans keep dogs as pets, but not wolves.

The book closes with a prayer. The authors hope they are mistaken, and that their warnings will one day seem needless. Yet if they are even close to right, the risks of superintelligence deserve a fuller place in the public conversation.

Andrew Leigh is a federal member of parliament and author of several books, including What’s the Worst That Could Happen? Existential Risk and Extreme Politics.

The Booklist is a weekly newsletter for book lovers from Jason Steger. Get it delivered every Friday.