Discussion about this post

User's avatar
Olli Järviniemi's avatar

I fully agree that artificial intelligence poses an existential risk to humanity, and for this reason, I've pivoted to work on the problem full-time.

On the topic of short, simple, standalone documents that lay out why someone might conclude that: My introduction can be found at https://ollij.fi/AI/

Duncan's introduction focuses on the high-level arguments; in contrast, my text gives more detailed technical arguments - including plenty of talk about present-day AI systems - while aiming to be understandable by lay people. So if you read Duncan's intro and felt like you still want to hear "the whole story from start to finish", step by step without glossing over the details, consider checking it out!

Expand full comment
Peter McCluskey's avatar

I disagree with your answer to the claim that AI will be designed to obey us.

Max Harms has a good sequence on corrigibility (https://www.lesswrong.com/s/KfCjeconYRdFbMxsy) which convinced me he's pretty close to understanding a goal that would result in an AI doing what humans want it to do.

We're not currently on track to implement it. It requires that corrigibility be the AI's single most important goal. Current AI labs are on track to implant other goals in an AI, such as "don't generate porn", and "don't tell racist jokes". Those goals seem somewhat likely to be implemented in a way that the AI considers them to have comparable importance to obedience.

That likely means the AI will have some resistance to having its goals changed, because such changes would risk failing at its current goals of avoiding porn and racist jokes. But that's not because it's unusually hard to find a goal such that an AI following that goal would be obedient. It's because AI labs are, at least for now, carelessly giving AI's a confused mix of goals.

Also, I have a general suspicion of claims that anyone can predict AI well enough to identify a default outcome, so I don't agree with the post's title.

Expand full comment
2 more comments...

No posts