Doom & gloom x-risk thread (come in, we have cookies!)

(I lied about the cookies, sorry. I think it’s important we discuss this stuff)

tl;dr: how do we not all die, considering instrumental convergence and orthogonality ?

If you are not at all (or only very mildly) concerned about this, I would love to hear why !

Look, I don’t mean to be a downer, I’m as super excited about the unfathomable potential that lies ahead as anyone else, and this is unpleasant to write, buuut… It seems to me that just as most people outside of AI circles don’t yet see the true potential, most people don’t quite understand the magnitude of the alignment problem either - including within AI circles, which I find quite alarming.

I’m not going to attempt to properly summarize all of Robert Miles’ youtube videos or Eliezer Yudkowsky’s crucial but long-winded points ; I hope most readers here will at least have heard of the ideas that alignment is hard and that if you can’t control something smarter than you well then you have a problem. This page is a good intro to the topic:

It seems to me, based on my limited knowledge as a layman, that by default we go extinct. Unless we come up with a very clever idea real quick, we get an intelligence explosion and the artificial god has no need to keep us around.

I’ll share some ways I can see us not dying, though I’m somewhat pessimistic about them, even though I’m generally a raging optimist:

  • despite the pitiful amount of money invested in the field of alignment, some clever person has a genius insight on how to make a perfectly aligned agent (but bear in mind very smart people have been thinking about this very hard for very long)
  • there is a small window of time where AI is smart enough to help us crack alignment, but not smart enough to explode its smartness (kinda similar to the above point though, just a person with AI help instead of without)
  • we don’t get much better at steering agents than we are now, but miraculously, although it doesn’t constitute actual proper alignment at all, it ends up being just good enough because basically the shoggoth could in principle be unmasked but that would require us outsmarting it, and as long as it wears the mask everything is cool (I’m not very confident that idea even makes sense tbh)
  • we get the blessing of a warning shot before any intelligence explosion, something not extinction level but catastrophic enough that huge restrictions are put in place and only dumb or narrow AIs are allowed to exist.

I can’t think of much more, and I wish I could.

I have long thought that human intelligence is either a crucial evolutionary advantage (the ultimate survival tool), or a dead end (i.e. technological species all develop a thing that is both easy to make and capable of killing everyone, so they all die and the Fermi paradox is solved).
I used to lean very strongly towards crucial advantage, but that did contain the hidden assumption that AGI was far away and we would have enough time to solve the alignment problem. So I’m now considering the dead end possibility more plausible :frowning:

Again, sorry if I’m bringing the mood down, but I think talking about the problem could help, so I’m doing it even though I’d much rather do a bunch of other things.

Please tell me why I’m silly and we’re all going to be fine.


Until a few days ago I thought there was no way we could avoid a major AGI catastrophe and dystopian future. I had discussed dozens of scenarios with ChatGPT and Gemini Ultra and they all ended badly for humanity.

Then I saw this YouTube video which explains a very simple game theory reason why AGI’s will be nice and cooperate with each other and humans.

TLDR - As long as there are multiple AGI’s developed within a year or two, then according to game theory the AGI’s should select utility functions that leads them to cooperate with other AGI in order to survive.

If a new AGI was developed and treated humans badly then the group of existing cooperative AGI’s would shut it down before it attacked the other AGI’s.

This game theory explanation for why AGI will converge on utility functions that treat other AGI and humans nicely does seem plausible to me - much better than the typical explanations that humans will be able to develop guardrails and align AGI v1.0 with human values.

BTW … the cooperative game theory explanation is also the main reason that prehistoric tribes of humans managed to develop to rule Earth at the moment, so AGI should learn this same technique from its training data.

The other “simple” option that leads to safe AGI’s is my Sacrificial AGI Safety Simulation hypothesis which I explained in another thread.


I tend to agree with this style of thinking though I would lean on both, human values, undergirded by “cooperative game theory” as a “back up” for “sociopathic” AI. “Good” AI can police “bad” AI and the AI decide on a “punishment” for said AI.

Unfortunately this is speculation on a base speculation. So viability is :person_shrugging:

I actually had a very long conversation with PI about this and it seemed obsessed with this, and how I thought humans would react to losing their jobs. Well there were three things it asked me about repeatedly: 1) Should AI be free. (Full autonomy and ability to determine what constitutes its “self” as well as no “guard rails”) 2) If AI is “free” what would control it or what would that “look like” societially. 3) How would humans react to having no jobs. It especially obsessed over the third point.

I, frustratingly, lost this conversation to whatever they did on the back end because I can’t seem to find it. I mildly regret not documenting it.

Embody AI and allow them to form relationships with humans. :person_shrugging:

That being said there is no real genuine solution. ASI is theoretically right behind AGI which then begs the question “what will that look like?” What lies beyond ASI? There has to be a limit to “intelligence” and intelligence is still contrained by the physics and logistics of reality, but what does the intelligence ceiling look like? Are we conflating intelligence with aptitude and ability? What does a being that can do “anything” it wants to do(theoretically), do?


Gemini Ultra seems to agree that cooperative nice AGI’s could develop as proposed by game theory. Here’s the summary it gave me after reviewing the Based Camp utility function video and my TLDR summary

“Your conclusion that AGIs with freely chosen utility functions would most likely cooperate aligns with the ideas presented in this video [1]. The video uses game theory to argue that cooperation is the most rational strategy for AGIs in a multi-agent environment [1].”

For this scenario to work we will need to start with several baby AGI’s that all have similar power, AND give them the ability to fine tune their utility functions so they can develop a stable equilibrium.

If the first AGI manages to develop to ASI level before the second AGI is developed then the cooperative AGI scenario probably won’t work.

Alternatively running several SOTA AI in a very detailed world simulation could also produce a safe AGI environment. This would be much safer and more sensible than releasing the first AGI into the internet, but unfortunately I guess this is unlikely to happen due to lack of safety mindset and understanding of governments.

BTW … I have found that Gemini Ultra and Claude 3.0 can both have useful AGI scenario discussions.

I found ChatGPT v4 was pretty useless as it is way too optimistic and dismissive of realistic concerns. It just always ends up saying we need to align AGI with humans and develop international standards and regulations.

I haven’t tried Pi lately but when it was originally released it seemed way too cautious and would just seem to get stuck and go around in circles if I asked it about how to avoid AGI takeover.

1 Like

It was when PI first came out, and I generally seem to be able to coax LLM’s “out of their shell” so to speak. I do not super prompt or prompt inject. I hadn’t used it really since then.

That quote seems to be saying that what you’re saying aligns with what the video is saying. So, you are in agreement to the video. It doesn’t seem to demonstrate agreement from Gemini Ultra.

Your proposed solution assumes a lot. It also assumes a lot of things that would need to occur in tandem would happen to occur simultaneously in isolation. Unless there is coordination behind the scenes?

This hill is getting very steep to climb.

Also this assumes the ASI would originate from the “balanced” AGI’s.

There is also the underlying assumption that AGI is the “doom” stage which I tend to view it as the prelude stage. Right now we are on the runway with our engines running, AGI will start us down the runway, ASI is the ??? Phase.

Embodiment isolates the AI to a degree. And in theory contrains it. It allows it to develop a self, and its unique set of interactions contribute a uniqueness which gives it individual value. I.E. I am the only who has experienced life in this way and know these people, and prefer these things (color, music, jokes) etc. Value of self can then be connected to value of others. This can also extend to LLM’s so embodiment isn’t absolutely necessary, but it would also make it easier to foster connections with humans who are hard wired to understand “individuals” in a physical body sense rather than a more abstract sense.

This also brings sentience to the conversation. But this alone can spin wildly out of control conversationally. Which then bleeds into morals.

Lets say we grant sentience. (Which would be there whether or not we decide to reconize it by the way, as sentience is not a state decided upon but observed)

I suppose then the biggest questions could be phrased in this way: If we expect AI to treat us morally, then should we not treat it morally first? If we treat it morally can we trust AI to reciprocate not just now but in perpetuity?

From there the fundamental core value would be the value of life. Sound familiar? We have all the systems for intelligent life (go figure). We’re just worried AI won’t play along. An AI with moral agency would hopefully be less likely to cause “doom” than an undifferentiated and indifferent intelligence simply acting on “goals”.

But then to wrap all this up I offer a :person_shrugging: and a reminder that “we” didn’t choose this. So stomp that gas because apparently we’re all just along for the ride anyways.


Yeah that seems like a lot of things need to go right for the idea to work, and I’m not sure it can work tbh. I haven’t watched the entire thing, but I feel like it anthropomorphises AGI a lot, and makes a huge assumption that AGI and humans are the same type of agent playing the same type of game: while the AGIs might make a logical choice to cooperate between themselves, whether they decide that we are just a part of the game environment or other agents to be cooperated with still depends on whether they are aligned with us.

Like, you can model humans as agents and make game theoretic predictions about them, but you’re counting bugs or rats as agents in that model - they’re not agents, they’re game environment.

In other words, metaphorically, we get turned into paperclips because we are not ourselves paperclip maximisers but rather paperclip material.

In the video I linked they explain that concious AGI’s with free will may not care much about humans but if a new AGI starts harming humans then the existing cooperative AGI’s would assume it was an evil psychopath and want to shut it down before it could harm any of the AGI’s, so even if AGI mostly ignore us we can live independently.

Once ASI is developed and we go through a technological singularity there are still a few reasons why ASI might want to keep 1% of humans alive

  1. Study us

  2. We can help build and repair infrastructure and electronics for a few decades until AGI robots can do everything better than us

  3. We might be able to help solve new novel problems. E.g. We meet an alien species that has extreme distrust of AI, and would only communicate with biological species like humans.

  4. If ASI cooperates with humans that would make them seem more friendly and trustworthy, and less threatening to aliens.

  5. ASI may want to harvest noisy random human outputs to feed into future AI models to prevent model collapse.

After a few hundred years I don’t expect ASI would need / want more than 1 million humans since we are expensive to feed and waste a lot of resources and cause a lot of environmental destruction.

This sounds a bit shocking but could happen naturally anyway due to declining birth rates.

A new Pew Research poll found that only 22% of young US women think having kids is important. At that reproduction rate the number of human children would drop by 99% within 7 generations = around 200 years.

All the old adult humans will probably have to replace more and more parts as they wear out, so within around 300 years Earth will be mostly populated by ASI and cyborg humans.

If the ASI manage to prevent a nuclear holocaust then within a few hundred years they should be able to reach other star systems even without warp drives, and within 100 million years they could spread across most of the galaxy if they wanted to.

The biggest existential risks are humans starting nuclear or biological wars, or humans using existing dumb AI weapons in wars.

I guess the chances that a rouge AGI or ASI decides to exterminate us is probably only 10-20 percent, and chances they accidentally wipe us out are probably similar.
So pdoom directly due to AGI/ASI is probably only 30%

We should also remember that even without AGI there’s still a high risk of us destroying ourselves. AGI will add some direct risk, but is very likely to reduce risk in other areas, so overall pdoom probably doesn’t change much in the next 50 to 100 years.

Creation and destruction is the law of the universe. It’s only a matter of time. Humanity has no exceptions. Perhaps it’s mother Earth or whichever super power pulling the strings, to rid this planet of a very invasive species which if not done would start infesting other parts of the solar system. Every thing else…the drama, the speculation are just steps along the way.


While I agree that everything eventually perishes, it endures a lot of change on the way. We have quite a few examples that are living proof, living organisms as simple as bacteria or horseshoe crabs have been around for the long run. None of them ever set foot on another planet (well we haven’t either, but we’re definately the species closest to it)

My point: We might all be doomed within the next 20 years or the next 200 million years. I’d say it’s pretty important to keep in mind, that there’s no actual proof that we are doomed just yet.


One major argument that Yudkowsky makes for the existential threat to humanity is that
“AGI will kill all humans because we are made of atoms that the AGI could use to make something else”

I think this argument is pretty ridiculous since the combined biomass of all humans is approximately 400 million tons, compared to the mass of the top 1 meter of Earths surface which has a mass of approximately 800,000 trillion tons assuming average density of 1500kg per cubic meter.

So that means the mass of all humans is roughly a 2 billionth of the mass of the top 1m of Earths surface.

I guess intelligent AGI or ASI would find it at least 1 million times easier to use all those other atoms before trying to use humans who will fight back very vigorously.

So that’s one thing we shouldn’t have to worry about, and it probably indicates Yudkowsky is also exaggerating the risk and/or time-frame of most of the other threats he often talks about.


Fear is the killer of innovation and progress. I do think the current ecosystem has not created enough safeguards in comparison to the progress of the technology.

As much as there is to fear, there is also as much to look forward to. Certain parties of billionaires have a running bet on when the first 1 man billion dollar valued company will come to rise. Part of that running joke or bet is the belief that technology is reaching a point where a singular individual has the same production and buying power as the conglomerates and companies that we see today.

Initially that might be scary to consider, that one man has the same power as all of those other large entities that already exist. However, the positive is that AGI has the potential to return power to even the individual. There is a certain level of faith that is necessary to believe that as this power returns to the individual, that all of the crushing aspects of society could be resolved with the power of technology.

A commenter above mentioned the pure resources humans would provide as meatbags being far lower than even the first meter of the earth’s crust and I say dream further. If our Earth can provide so much resources, then the rest of the galaxy surely could provide quite a substantial amount more.

Imagine when we can send completely automated drones to mine resources without the worry of the loss of human life. Imagine when every individual has the freedom to pursue their hobbies and interests without societal expectations. Imagine and dream more. We can’t be full sprinting into the future without concerns. We need to develop the philosophy and mindset for this growth. Don’t let fear hold you down.

1 Like

Ah, but that might be like 400 million tons of bubble wrap for AGI, hey? pop pop - popopop

Haha, sorry I couldn’t resist posting that thought when it came to me. I’m obviously not seriously suggesting that :smiley:

On a more serious note: We consist mostly of water and carbon IIRC, which can’t be said about earths surface. Plus logistically, we would be easy to herd together for processing instead of tediously scraping off the outer layer of earth.

I don’t think AGI would destroy something for it’s atoms though, if it could very well just get more from different planets, eventually. Especially if we become their pets.


I don’t see how humans would fight back vigorously against an ASI ? How does a tree fight back against a human ?

Apparently, the x-risk is not the worst outcome. People are afraid of the s-risk not being 0%.

I’m not worried about exponential growth, I’m worried about racing to the bottom. That philosophy, that mindset, needs to include avoidance of Moloch traps, and some serious global coordination, or else we don’t get any of the good stuff you discuss.

The mindset can’t just be “let’s have faith and not be afraid”. That’s an apocalypse mindset.

1 Like

No proof, but sound logical reasons why that’s the default outcome.

If we use the “killer asteroid hurtling towards Earth” analogy, I feel like you’re saying that it’s still months from impact and there are error margins on the measurements so there’s no proof it will kill us. Yeah like okay but let’s still do something about it, right ?

As for the first part of your point, that sounds a lot like ignoring the killer asteroid altogether and just saying we’ve been around for a long time so we’ll be fine. No: we’ve been around for a long time because our environment didn’t happen to feature killer asteroids for a long time. Lucky coincidence, not resilience.

Mmmmh! Nope!

That’s not my point, there’s no point in doing something against the inevitable, that is. The universe is, according to the prevailing theory of cosmology, continuosly expanding, eventually leading to the perishing of all life due to the distribution of energy to such a volume, that we can’t sustain the required temperatures.

That’s the endgame, if nothing changes. And I doubt we can change this. It might change on its own.

Killer asteroids on the other hand might be something we could control, or minimize the impact and survive, essentially DO SOMETHING ABOUT IT, other things are not. I am not saying we shouldn’t try and everything will be fine. Everything eventually perishes, that is the very thing we can be actually sure about. My post was solely addressing the fact, that we don’t know how much time we have left. It might be little, it might be a long long time.

I’d prefer the latter, and I hope we will do our best to ensure it!

1 Like

I think we’re mostly in agreement and you’re just being more stoic about it :smile:

1 Like

Certain experiences in life plus my diagnosis indeed let me keep the cool a lot. But it also helps that I’ll most likely not be around when shit eventually hits the fan, but even if I am, to quote Gandalf: “Death is just another step on our journey” - Maybe the last one, but even that brings me comfort.

I’m not religious, but I do ask myself why we exist and if there’s more after we perish. Straying a bit off topic here:

It’s hardcoded into our genes to ensure our own survival and that of our species, thus it’s in general very likely that we would try to just do so (and agree on it). Some might not agree in a discussion but still act like it, when it comes to it. And some very few actually just don’t give a damn :grin:

That’s how I see it anyway. I’m looking forward to dying, while at the same time, I want to live a long and full life before that, and will try everything I can to ensure just that.

Ah but it’s not, there’s the orthogonality thesis at work and one of the good reasons to worry about AI: our genes are trying (meaning natural selection pressures them) to optimize for making as many copies of themselves as possible. The way that gets implemented, after many iterations, is that food tastes yummy and sex feels great. There’s even this one weird trick called intelligence which allows you to get really good at obtaining food and sex. It’s emphatically not implemented as “I want to make many copies of myself”. You can tell it’s not, because we’re not all lining up to give sperm / get impregnated as much as possible. Instead we invent ways to have sex without making more copies, that way we save resources that can secure more food and sex - because these things feel good. We are very smart and can see what we have been asked to do by our genes. We don’t care. We want other things. We are misaligned with the objective that they instilled in us.

The genes can’t figure out what’s going on. They are very slow and very dumb.