How to Survive the AI Endgame

New Scientist

Picture Credits: gettyimages

Eliezer Yudkowsky and Nate Soares warn in their new book that AI could become uncontrollable and threaten humanity, but their extreme scenarios and policy proposals, like bombing data centres, veer into alarmist science fiction. The reviewer argues we should focus instead on real-world crises such as climate change.

Stopping an AI apocalypse

Author: Jacob Aron

Eliezer Yudkowsky at the Machine Intelligence Research Institute (MIRI) in California has been proselytising this cause for a quarter of a century. His new book If Anyone Builds It, Everyone Dies, co-written with his MIRI colleague Nate Soares, successfully distils this argument into a simple, easily digestible message that will be picked up across society. The problem is that, while compelling, their argument is fatally flawed.

Things start to go wrong around chapter three, in which the pair describe how AIs will begin to behave as if they “want” things, while skirting around the question of whether we can really say a machine can “want”. They refer to a test of OpenAI’s o1 model, which completed an accidentally “impossible” cybersecurity challenge, pointing to the fact that it didn’t “give up” as a sign of the model behaving as if it wanted to succeed. I find it hard to read any kind of motivation into this scenario – if we place a dam in a river, the river won’t “give up” its attempt to bypass it, but rivers don’t want anything.

The next few chapters deal with what is known as the AI alignment problem, arguing that once an AI has “wants”, it will be impossible to align its goals with those of humanity, and that a superintelligent AI will ultimately want to consume all possible matter and energy to further its ambitions.

Sure – but what if we just switch it off? For Yudkowsky and Soares, this is impossible. Their position is that any sufficiently advanced AI is indistinguishable from magic (my words, not theirs) and would have all sorts of ways to prevent its demise. They imagine everything from a scheming AI paying humans to do its bidding (not implausible, I suppose, but again we return to the problem of “wants”) to an AI discovering a previously unknown function of the human nervous system that allows it to directly hack our brains. (I guess? Maybe? Sure.)

If you invent scenarios like this, AI will naturally seem terrifying. So, what should we do? The pair have a number of policy prescriptions, all of them basically nonsense. They say it should be illegal to own more than eight of the top 2024-era graphics processing units, the computer chips that have powered the current AI revolution, without submitting to nuclear-style monitoring by an international body. (For reference, Meta has at least 350,000 of these chips.) Once this is in place, they say, nations must be prepared to enforce these restrictions by bombing unregistered data centres, even if this risks nuclear war.

Take a deep breath. How did we get here? For me, this is all a form of Pascal’s wager. Mathematician Blaise Pascal declared that it was rational to live your life as if God exists. If God does exist, believing sets you up for infinite gain, while not believing leads to infinite loss in hell. If God doesn’t exist, maybe you lose out a little from living a pious life, but only finitely so. Similarly, if you assume that AI leads to infinite badness, pretty much anything is justified in avoiding it.

It is this line of thinking that leads some to argue that any action in the present is justified as long as it leads to trillions of happy humans in the future. Frankly, I don’t understand how anyone can think like this. People alive today matter. Billions of us are threatened by climate change, a subject that goes essentially unmentioned in If Anyone Builds It, Everyone Dies. Let’s consign superintelligent AI to science fiction, where it belongs, and devote our energies to solving the problems of science fact here today.

___

Credits: TCA, LLC.

How to Survive the AI Endgame

Like this:

Discover more from thinkly gold