4 Comments

I fully agree that artificial intelligence poses an existential risk to humanity, and for this reason, I've pivoted to work on the problem full-time.

On the topic of short, simple, standalone documents that lay out why someone might conclude that: My introduction can be found at https://ollij.fi/AI/

Duncan's introduction focuses on the high-level arguments; in contrast, my text gives more detailed technical arguments - including plenty of talk about present-day AI systems - while aiming to be understandable by lay people. So if you read Duncan's intro and felt like you still want to hear "the whole story from start to finish", step by step without glossing over the details, consider checking it out!

Expand full comment

I disagree with your answer to the claim that AI will be designed to obey us.

Max Harms has a good sequence on corrigibility (https://www.lesswrong.com/s/KfCjeconYRdFbMxsy) which convinced me he's pretty close to understanding a goal that would result in an AI doing what humans want it to do.

We're not currently on track to implement it. It requires that corrigibility be the AI's single most important goal. Current AI labs are on track to implant other goals in an AI, such as "don't generate porn", and "don't tell racist jokes". Those goals seem somewhat likely to be implemented in a way that the AI considers them to have comparable importance to obedience.

That likely means the AI will have some resistance to having its goals changed, because such changes would risk failing at its current goals of avoiding porn and racist jokes. But that's not because it's unusually hard to find a goal such that an AI following that goal would be obedient. It's because AI labs are, at least for now, carelessly giving AI's a confused mix of goals.

Also, I have a general suspicion of claims that anyone can predict AI well enough to identify a default outcome, so I don't agree with the post's title.

Expand full comment

I don't claim that the AI will be *successfully* designed to obey us ... I'm not sure which line in the above made it seem like that was my belief.

My claim is more "people are just *assuming* that they'll be able to get it to, and saying out loud that they will, while not actually doing the legwork."

Note that Max Harms is one of the other commenters on this piece. =)

I agree with you that AI labs are carelessly giving AIs a confused mix of goals, and I mostly think that the AIs' true and ultimate goals will be *none* of the things the labs are pushing them towards.

> I have a general suspicion of claims that anyone can predict AI well enough to identify a default outcome

Note that I'm not pointing at a single outcome. I'm saying "in the space of things that are possible, the ones where there's something like a recognizably happy extant humanity are a vanishing minority." Deadly by default meaning "you don't unleash superpowerful optimization processes and get a world that's still good for what was there before, by default."

Expand full comment

I like this! It feels like an overall good primer. But I also want to nitpick with the section titled "Smarter opponents simply do not lose to dumber opponents—not when the gap between them is big enough." In particular, notice how much weight the "enough" is lifting, there. Like, contrast it with a claim like "You can get rich if you make a rock that's smooth enough." That claim is... true? But it's implication ("Making rocks smooth is a good way to get rich.") is very different from where it gets its literal truth from (at some level of making things smooth you're surpassing modern materials science and therefore have a technology and/or new physics).

Like, more on the object level, I've heard a wishful thought that goes: "Information processing is not the sort of thing that scales infinitely. The human brain is, by volume at least, almost entirely long-distance wires. Two humans thinking together is almost always less than 200% as efficient as one human. We see diminishing returns all over the place, and there's probably something like a general exponential cost to increasing intellect, even in the best systems. What if AI runs out of steam at an intelligence stratum were it's not smart enough to beat us in the way you describe?"

These people haven't grokked how much room there is above us. But they're worth engaging with.

I don't see anywhere in your post where you argue that superintelligence will be super *enough* to make the "simply do not lose" section apply.

Relatedly, I think there's a more outside-view perspective that says "Sure, maybe the AI won't directly beat us. Maybe they'll trade with us and cooperate for the same reason that most psychopaths cooperate with social norms. But I don't see any reason why we'd expect humans to stay in control of the future in the long-term, there, and in the absence of being able to compete we may go extinct." (See: https://www.alignmentforum.org/posts/HBxe6wdjxK239zajf/what-failure-looks-like) I think it might be good to mix in more of this kind of argument in, though you represent it in some places.

Expand full comment