(I lied about the cookies, sorry. I think it’s important we discuss this stuff)
tl;dr: how do we not all die, considering instrumental convergence and orthogonality ?
If you are not at all (or only very mildly) concerned about this, I would love to hear why !
Look, I don’t mean to be a downer, I’m as super excited about the unfathomable potential that lies ahead as anyone else, and this is unpleasant to write, buuut… It seems to me that just as most people outside of AI circles don’t yet see the true potential, most people don’t quite understand the magnitude of the alignment problem either - including within AI circles, which I find quite alarming.
I’m not going to attempt to properly summarize all of Robert Miles’ youtube videos or Eliezer Yudkowsky’s crucial but long-winded points ; I hope most readers here will at least have heard of the ideas that alignment is hard and that if you can’t control something smarter than you well then you have a problem. This page is a good intro to the topic: https://pauseai.info/xrisk
It seems to me, based on my limited knowledge as a layman, that by default we go extinct. Unless we come up with a very clever idea real quick, we get an intelligence explosion and the artificial god has no need to keep us around.
I’ll share some ways I can see us not dying, though I’m somewhat pessimistic about them, even though I’m generally a raging optimist:
- despite the pitiful amount of money invested in the field of alignment, some clever person has a genius insight on how to make a perfectly aligned agent (but bear in mind very smart people have been thinking about this very hard for very long)
- there is a small window of time where AI is smart enough to help us crack alignment, but not smart enough to explode its smartness (kinda similar to the above point though, just a person with AI help instead of without)
- we don’t get much better at steering agents than we are now, but miraculously, although it doesn’t constitute actual proper alignment at all, it ends up being just good enough because basically the shoggoth could in principle be unmasked but that would require us outsmarting it, and as long as it wears the mask everything is cool (I’m not very confident that idea even makes sense tbh)
- we get the blessing of a warning shot before any intelligence explosion, something not extinction level but catastrophic enough that huge restrictions are put in place and only dumb or narrow AIs are allowed to exist.
I can’t think of much more, and I wish I could.
I have long thought that human intelligence is either a crucial evolutionary advantage (the ultimate survival tool), or a dead end (i.e. technological species all develop a thing that is both easy to make and capable of killing everyone, so they all die and the Fermi paradox is solved).
I used to lean very strongly towards crucial advantage, but that did contain the hidden assumption that AGI was far away and we would have enough time to solve the alignment problem. So I’m now considering the dead end possibility more plausible
Again, sorry if I’m bringing the mood down, but I think talking about the problem could help, so I’m doing it even though I’d much rather do a bunch of other things.
Please tell me why I’m silly and we’re all going to be fine.