Scrupulosity Sequence #2: Self-Experimentation and the Replication Crisis

Tags

ozy blog post, rationality, scrupulosity sequence

In a perfect world, if someone had a psychological problem, I’d be able to tell them all about the rich peer-reviewed psychological literature that has to do with specifically their problem, all of which has effect sizes of a billion and definitely and certainly applies to everyone.

Unfortunately, we live in this world.

For the past decade or so, the social sciences (particularly psychology) and medicine have been going through the replication crisis. Basically, when you repeat social scientific and medical experiments, many of the effects that existed the first time you did the experiment disappear. Some projects suggest that fewer than half of psychological studies show a reproducible effect.

Why is this a problem? Imagine if, when we repeated the experiment of dropping a feather and a bowling ball in a vacuum chamber, sometimes they fell at the same rate, but sometimes the feather fell faster, sometimes the bowling ball fell faster, and sometimes the bowling ball fell up and hit the experimenter in the face. It would be pretty clear that we don’t know anything about how gravity works, and all the physics that built on the idea that feathers and bowling balls fall at the same rate in a vacuum chamber was totally wrong.

It would also bring up the question of how nobody noticed that sometimes bowling balls fall up before. The answer for psychology is that many psychologists use questionable research practices: one survey suggests that more than half of psychologists have. Many psychologists check whether the data shows an effect; if it is, they publish, and if it isn’t, they recruit fifty more participants to see if it will show an effect. Other psychologists will write a paper that includes multiple studies and only report studies that showed an effect, even though they conducted other studies that showed no effect. Still other psychologists will decide whether or not to leave out outliers depending on which way shows an effect as existing. There are lots of other questionable research practices, many of which require more statistics to explain, but I hope that gives a sense of what people are doing.

The other problem is that humans behave much more inconsistently than bowling balls do. You’d expect that if you drop a bowling ball in a vacuum chamber, it’ll be exactly the same as every other bowling ball in every other vacuum chamber around the globe. You only need to drop one bowling ball in a vacuum chamber to find out how bowling balls in vacuum chambers work everywhere. But if you do a study with a hundred and fifty people, you might have happened to find a hundred and fifty really weird people. Even if the effect is real within your sample, it might not apply to any other group of a hundred and fifty people, because people are weird.

It is hard to overstate the implications of the replication crisis. I would go so far as to say that you should not read quantitative psychological research, because it will make you actively stupider and have more inaccurate beliefs. You should view quantitative social scientific research more generally with grave suspicion. (Qualitative social science research wasn’t ever really supposed to replicate in the first place; ethnographies are still likely to provide an accurate insight into what it’s like to be a person very different from you, although you might be well-advised to skip the chapter about the generalizable implications.)

Now, you might think we’re in a complete state of epistemic helplessness, unable to know what facts about people are true and what facts are false. But that’s not true.

People are actually very good at figuring out whether psychological results are true or false. I’d cite the research on this but I just finished explaining why that’s bad, so instead I’ll direct you to 80,000 Hours’s psychology replication quiz. If you don’t get better than half, I will write you a drabble on the subject of your choice.

Consider priming, one of the results that has most completely and utterly failed to replicate. It turns out that doing a bunch of anagrams of words related to old people does not actually make you walk slower, imagining doing something morally wrong does not make you want to buy cleaning products, and reading the word ‘stupid’ does not make you assume people you’re talking to are less intelligent. Ask yourself: have I ever in my life experienced reading a bunch of words related to old people and then walking more slowly? Have you ever been like “sorry I’m so slow walking, I was helping out at the nursing home this morning”? No?

Most people spend much of their time thinking about other people. The human brain is in many ways optimizing for understanding other people. (Compare the abilities of your laptop computer and a five-year-old child with regards to comforting you when you’re sad, then compare their abilities with regards to quantum physics.) We are, ourselves, people. We have natural expertise and tons of information about the subject. We can outperform science.

A lot of people think of knowledge as coming from science, and science as coming from people in white labcoats doing complicated things with numbers and publishing peer-reviewed journal articles. But science is just a particular kind of empiricism. Empiricism means going out into the world, looking at things, coming up with an idea about how they might work, trying it out, and seeing if you’re right. You do empiricism all the time– when you try a new recipe, when you see if a good night’s sleep will make you feel better, when you check whether this movie reviewer has any clue what she’s talking about.

And you can do empiricism about your own psychological problems.

There are a bunch of meta-analyses that suggest that SSRIs, in general, are just barely better than placebo. There are also a lot of people who take an antidepressant, go off their antidepressant, get depressed, get back on their antidepressant, stop being depressed, go off their antidepressant, get depressed, get back on their antidepressant, stop being depressed, go off their antidepressant, etc. If you are in that group of people, what should you expect to happen if you go off your antidepressant?

You should expect to be depressed.

In a certain sense, this is totally ignoring science– you’re looking at that peer-reviewed meta-analysis with a bunch of complicated statistics and a sample size larger than one and going “…nah.” In another sense, this is doing science of the sort that is most closely connected to the question. The thing you care about is what will happen if you go off your antidepressant, you have empirically investigated it, and the answer is that you will be depressed.

At this point I should take a moment to discuss the placebo effect. Unfortunately, many people use the phrase “placebo effect” in a sloppy way. In a broad sense, the placebo effect refers to the ways a disease or condition changes if you take a sugar pill instead of a pill with an active ingredient, or talk to a kind intelligent person with no therapeutic training instead of a therapist, or whatever. In a narrow sense, the placebo effect refers to a particular source of these effects: the mind-body effect where if you do something at all to help the condition then you feel better. (More precisely, the former is called the “placebo response” and the latter is called the “placebo effect.”)

Actually, there are lots of causes of the placebo response. For example, a lot of conditions like depression fluctuate: you feel worse for a time, then you feel better, then you feel worse, then you feel better again. People tend to seek out treatment when they feel really bad and stop treatment when they feel better, so it looks like the treatment works, but it’s actually just the natural course of the disease. Similarly, some diseases are time-limited: whether or not you drink orange juice, your cold is going to take about a week to get better. Another example is that some people feel bad that people have put all this effort into giving them medicine and then they don’t feel better, so they rate themselves as feeling better even though they really don’t.

Ideally, you would do a placebo-controlled study on yourself. The process is fairly simple for medications you expect to work within one day, and I think more people should consider doing it. However, there are lots of situations where placebo-controlled studies don’t work at all: it is very difficult to do, say, placebo self-therapy or placebo exercise, and many medications such as SSRIs have to be taken for weeks to work properly.

A more promising strategy is to think about whether your experiences are consistent with the response being a placebo response. For example, if you’ve been depressed for years, you can rule out “it was just the sort of thing that would get better in a week anyway” as an explanation. If you typically miss your antidepressant because of a snafu with the pharmacy, that suggests your post-antidepressant depression is a real effect; if you typically miss your antidepressant because you’re too sad to take it, that suggests it may be a placebo response. If you’ve tried six antidepressants and only the seventh worked (despite them all being similar-looking little pills with similar side effects), it’s unlikely to be a placebo response, since there’s no reason to believe pill #7 would have a placebo response the others didn’t.

How do you choose things to experiment with? I offer the following guidelines:

Good safety profile; few to no side effects. SSRIs are widely used and if they caused sudden horrible death we would probably know. It is difficult to imagine pomodoros or Facebook blockers having health effects one way or the other. Exercise actually makes you live longer. Conversely, taking random research chemicals off Longecity has all kinds of possible horrible side effects, and antipsychotics are routinely rated as “the only thing worse than psychosis” or “actually, no, I think I would rather be psychotic”.
Easy to figure out if it works. If an intervention promises to help you get more work done or be less anxious, it is easy to figure out whether your to-do list has fewer items on it or you’re able to talk to strangers instead of hiding in the corner out of fright. If an intervention promises to help you reach enlightenment, this is harder to determine.
Easy to do. Installing Facebook-blocking software or using a pomodoro tracker is very easy; anyone can do it in about one minute. Exercise, dietary changes, meditating for an hour a day, and so on are very difficult.
Quick to test. It takes one afternoon to check whether pomodoros work for you: if you didn’t get more work done that afternoon, it probably didn’t. It can take months or years of work to find out whether meditation works for you. The former is a better thing to experiment with.
Explanation makes sense. Ask the person why it works. If the explanation is something like “when you do something you’re scared of and it isn’t a disaster you’re likely to be less scared,” then the thing is more likely to really work. If the explanation involves unscrambling anagrams about old people making you walk slower, then maybe don’t try that one.
Good reviews. If it worked for someone else, maybe it’s worth trying. You can ask your friends, but it’s also worth mining historical figures for ideas (maybe Marcus Aurelius has some good tips) or, sigh, reading the psychological literature.

5 thoughts on “Scrupulosity Sequence #2: Self-Experimentation and the Replication Crisis”

Neb said:

December 12, 2018 at 2:21 pm

For explanations that make sense, watch out for whether the explanation actually matches the thing you experience. I had the ‘likely to be less [anxious]’ explained to me a lot, and it took me a while to realize that the explanation that actually matched what was happening was ‘when you do something and it is in fact super unpleasant, you are going to continue to become less likely to do it’.

LikeLiked by 1 person

- tcheasdfjkl said:
  
  December 12, 2018 at 6:33 pm
  
  Yeah, for exposure-therapy-type stuff it crucially matters what the outcome of doing the scary thing is. If it’s as bad or worse than your brain expected, you’ll be as scared or more scared of it in the future. (This is why the best advice is to try smaller, safer versions of the thing you’re scared of.)
  
  LikeLike
  
tailcalled said:

December 12, 2018 at 4:11 pm

I think one should be very careful about applying this to domains where one cares a lot about getting certain results, e.g. in politics or in psychological domains where the correct answer may have large effects on one’s personal status. People are great at lying to themselves, so if you apply it to these domains you just end up with “the outgroup is bad and the ingroup is always right”, which… isn’t exactly great.

LikeLike

Doug S. said:

December 12, 2018 at 4:27 pm

SSRIs have been known to have nasty withdrawal symptoms if stopped abruptly.

LikeLike

tcheasdfjkl said:

December 12, 2018 at 7:04 pm

I like this, and obviously it is good for lots of things besides scrupulosity.

For me, virtually nothing is in the “easy to figure out if it works” category.

One problem is that my mental state naturally fluctuates roughly every few days, so it’s almost never obvious whether a given change is due to some intervention or just part of that pattern. (This also means that nothing that doesn’t have a very short-term or very dramatic effect is going to be in the “quick to test” category.)

Another problem is that while I do limit myself to trying one primary intervention (e.g. a new medication) at a time, I don’t live in a vacuum and there’s always going to be variation in e.g. what I eat, who I talk to, what the weather is like, what tasks I need to get done, what productivity tools I use, etc.; this adds extra difficulty to tracing causality.

A third problem is that apparently I’m just not very good at noticing and remembering details of my experience. This was a surprise to me when I started trying psych meds – I do a lot of introspection, and I would have thought I was pretty in touch with my feelings, but in fact it seems that (a) sometimes when things are bad it’s really hard for me to pin down the details of that beyond “generalized shitty feeling”, (b) I experience variants of “generalized shitty feeling” often enough that to make things bearable I’ll often need to tune out the details and go “shrug, this sucks, oh well”, which gets in the way of introspection (c) my memory of feelings and experiences is super fallible and it’s really hard for me to compare present feelings to past ones, let alone different past feelings to each other, and also I can never remember things like exactly which nights I had trouble sleeping (and therefore it’s hard to check whether that lined up with e.g. a new medication).

I’ve been fixing this with some success by doing a lot of tracking. I have at least seven apps on my phone for tracking various things, in addition to spreadsheets for tracking more stuff and analyzing my logs. Some that I’ve found particularly useful are:
– Daylio: mood tracking. I have it ask me a few times a day how I’m feeling, and also record any other notable moods that occur at other times, and then I have graphs of this information over time. (I think you have to pay it a bit of money to get it to give you more than one reminder a day; I find this worth it.) Daylio’s own graph/analysis functions are pretty insufficient – for example, you can only look at a graph of one month at a time, which is pretty unhelpful for figuring out long-term trends – but you can export your data and then play with it in a spreadsheet. Doing this was extremely helpful for me – I made a graph of my average mood per week and was able to discover that while I very regularly cycle between one- or two-week up- and downswings, the mean around which my mood cycles notably changed when I started a new antidepressant, but did not change when I increased my dose, which was good evidence that I should be taking the lower dose (and also looking for something that would improve or eliminate the downswings, as those are still pretty disabling).
– SleepBot: sleep tracking. (however, I think this might currently be unavailable for new users due to GDPR issues)
– Timesheet: time/productivity tracking. (important for me as one of the things that both varies a lot for me and is a big problem is ability to do work)
– KeepTrack: EVERYTHING tracking. This app is pretty great; you can create trackers to track anything you want in one of many formats; you can set goals for each tracker in more or less any format; you can combine graphs to see how different pieces of data move against each other. (Though I haven’t yet looked into how to e.g. graph the things I track there against the things I track in other apps.)

However, I’ve been finding that even this is not enough, because sometimes symptoms happen that are not things I have already thought to start tracking, and then I don’t track them anywhere and forget about them or can’t correlate them with anything. I’ve gone and bought a little notebook for tracking things in a more free-form format; I now spend a bunch of time writing down possibly-relevant-to-experimentation details of my experience, which is certainly a cost, but I think it’s probably worth it for being able to actually know what I was experiencing when so that I can better trace causality and make decisions.

LikeLiked by 1 person

	Tulip on On Taste
	nancylebovitz on Disconnected Thoughts on Nouns…
	nancylebovitz on Against Asshole Atheists
	nancylebovitz on Against Asshole Atheists
	Richard Gadsden on Sacred Values Are How Ethical…
	Richard Gadsden on The Curb Cut Effect, or Why It…
	Review of Ernst Cass… on Against Steelmanning
	Timberwere on Monsterhearts Moves List
	Articles of Interest… on Getting To A Fifty/Fifty Split…
	Eric on Bounty: Guide To Switching Fro…

Thing of Things

~ The gradual supplanting of the natural by the just

Scrupulosity Sequence #2: Self-Experimentation and the Replication Crisis

5 thoughts on “Scrupulosity Sequence #2: Self-Experimentation and the Replication Crisis”

Leave a comment Cancel reply

Share this:

Related

5 thoughts on “Scrupulosity Sequence #2: Self-Experimentation and the Replication Crisis”

Leave a comment Cancel reply