, ,

[attention conservation notice: Intended for a rationalist audience, and likely to be of little interest to non-rationalists]
[epistemic effort: I discussed the topic with friends and will update this blog post in response to criticism I consider reasonable]

CFAR has recently pivoted to focusing more on AI safety. I feel a lot of qualms about this, which I wanted to write up in a blog post. I had previously had qualms about cooperation and rationality as the common interest of many causes; however, I think this post addresses my concerns and made me significantly less worried. However, I do have another concern.

I want to make my positionality clear here, because I think it does affect my thinking. Most notably, I am highly uncertain about AI risk, which leads me to consider it not to be one of the most effective cause areas. For obvious reasons, this increases my qualms. I have tried to not say anything that I wouldn’t endorse if CFAR were discussing about pivoting to a focus on global poverty rather than existential risk; however, I encourage my readers to account for my bias when they assess my arguments.

I think that it is possible to create a teachable art of rationality (as in domain-general thinking skills which improve one’s ability to reason in a wide variety of fields), but that we currently do not have one. I believe that a teachable art of rationality would positively affect the world by causing people to make fewer dumbass decisions. For this reason, I care about CFAR’s success. However, while I believe that CFAR likely has a positive effect on its attendees, I’m not convinced that this isn’t due to the Dodo bird verdict: that is, I think it’s likely that CFAR has positive effects because they are kind, empathetic, high-status people who want other people to improve, not because they have necessarily figured out rationality skills yet. To be clear, this is in no way an insult to CFAR: developing an entire new field of psychology is really hard! However, this does mean that I am not a CFAR alum nor did I have any interest in attending CFAR. For this reason, their post making me less likely to want to attend CFAR might not be particularly interesting to CFAR; after all, I wasn’t going to attend there anyway.

CFAR and MIRI have always been very closely linked; for instance, CFAR and MIRI share an office. To the extent that this blog post makes public what was always true, I think it’s good and virtuous; honesty and transparency are an important part of creating evidence-backed charities. Nevertheless, I think that this thing being true is likely to be bad.

Imagine that instead of saying that they were focusing on AI risk, CFAR said that they’d decided that CFAR as an organization would plan and act as though deworming charities were the most important ones to donate to. For this reason, they would prioritize recruitment of people working in the public health sphere (in addition to effective altruists and rationality geeks more generally). They would prioritize skills that are useful in deworming, such as doing expected-value calculations in high-risk situations and avoiding scope insensitivity. Their metrics include the effects their graduates have on deworming.

I have an issue with this hypothetical. Deworming charities being the most important is a specific empirical claim; to believe deworming is the one of the most important causes, one must believe that the studies were well-conducted, that it’s likely the effects aren’t spurious, that they generalize to the locations where deworming is happening, etc. Similarly, the importance of AI risk is also a specific empirical claim: to believe that AI risk is one of the most important causes, one must believe that it’s likely that artificial intelligence will be developed, that it is possible for people currently working on AI to have a positive effect, etc.

But it’s possible that deworming isn’t one of the best interventions. For instance, imagine that the deworming studies were very poorly done. While an ordinary person might not be able to tell that they were poorly done, it’s obvious to someone who knows how to read a study. Now imagine that knowing how to read a study is, in fact, a very important rationality skill. (I’m not claiming either that the deworming studies are poorly done or that knowing how to read a study is an important skill; this is strictly a hypothetical.) I think it’s very possible that hypothetical Deworming CFAR would flinch away from the idea “maybe we should teach people to read studies”. Or they’d run a pilot program, notice that teaching people to read studies tends to make them less enthusiastic about deworming, and say “well, guess that learning how to read studies actually makes people more irrational.” Or they’d still teach the how-to-read-studies unit, but they’d alter it so that people continue to be enthusiastic about deworming afterward– thus not teaching an important aspect of the skill. Or it would never occur to them to teach that unit, perhaps because this is a flaw in the rationality of the employees of Deworming CFAR itself. (After all, they aren’t noticing that deworming is a poor cause area…)

To be clear, I’m not suggesting that anyone at CFAR would intentionally teach people poor rationality skills in order to trick them into caring about AI risk. Of course not! CFAR as an organization genuinely cares about developing the art of rationality, and genuinely believes that AI risk is the biggest threat to humanity reaching the stars. But it doesn’t have to be intentional to be harmful. Many biases happen on the subconscious level, as unnoticed flinches or ideas that you happen to overlook.

You might argue that if deworming is not the best cause, surely hypothetical Deworming CFAR would notice this and reorient themselves. But I don’t think that’s true. We don’t have the art of rationality yet. CFAR employees are still vulnerable to confirmation bias, self-deception, and motivated reasoning. I think it’s important to reduce the harm caused by these aspects of human psychology.

I think endorsing specific empirical claims is different from endorsing value sets. For instance, “people in the developing world matter as much as people in the developed world” is a value, as is “it would be bad for humanity to be destroyed.” I do not think that the problems of endorsing empirical claims apply to endorsing values, because values do not generally work in the normal Bayesian way. It is hard to think of evidence that will disprove the claim that people in the developing world matter as much as people in the developed world (…maybe if they’re all p-zombies?). Rationality is, to a very large degree, about gathering and properly updating based on evidence. Therefore, a value has a much lower risk of hurting the rationality curriculum than an empirical belief. This is why I was less worried when it seemed to me that CFAR’s mission was to be a general effective altruist organization, because general effective altruism is almost all about values rather than facts.

CFAR did this shift for very good reasons. There are a lot of potential rationality skills; you need some criteria to be able to prioritize which ones you discover and teach. There are a lot of potential audiences for rationality; you need some criteria to pick which ones you will target. It is important to have integrity; secretly having AI risk as one’s mission while passing oneself off as a cause-neutral rationality organization is wrong. And we’re not trying to develop the art of rationality to make Facebook comment threads better; it makes sense to want to prioritize the most important issues facing the world.

This is probably the part of the post where I am supposed to offer a solution. Unfortunately, I have never run a rationality organization, and I am highly uncertain about what the correct path is. I don’t know how CFAR can get the very real benefits of this pivot while also reducing the risk of their mission distorting their rationality curriculum. But I do think that a first step is keeping in mind the risks.