“Psychic Scorecard” — Annie’s Newsletter, Sept 7, 2018

Like anyone in the forecasting biz, we should also look at what they MISSED

Psychic predictions are an easy target for a skeptic and this article on RelativelyInteresting.com evaluating the 2017 predictions of four leading psychics was skeptical.

One particular basis for the skepticism resonated with me as a valuable reminder of how to evaluate the accuracy of all forecasters: looking at not just what they predicted but also at what they missed.

The piece started by pointing out all the big things that the leading psychics OMITTED from their 2017 predictions.

All four missed (among many other major events) the following:

  • The Harvey Weinstein scandal and the #METOO movement.
  • The Las Vegas shooting.
  • The skyrocketing price of Bitcoin.
  • The devastation of Hurricane Harvey in Houston and Hurricane Maria in Puerto Rico and the Caribbean.

In evaluating forecasts, we have a strong bias to look at commissions vs. omissions, things that were predicted vs. things that were not predicted.

A psychic, like anyone else, could cherry-pick a good record out of a few predictions that come true, especially if their predictions are vague enough to fit to a variety of events that might occur.

But the big events they miss are exactly what we tend to overlook in evaluating whether their psychic abilities are any good.

And that is what we miss in our own decision-making lives as well.

If we are evaluating why an outcome happened, we tend to look at the actions we took rather than the actions we didn’t take (or didn’t consider). We look more at the decisions we made vs. the decisions we didn’t make.

When you think about it, we are all in the prediction business: decision-making depends on predicting possibilities and probabilities among alternative futures.

As we try to assess forecasts (our own and others), it’s important to look not only at hits and misses among predictions that were made, but also at what wasn’t predicted that should have been, given the methods of the forecaster.

Why you should read past the headlines and ask about reference groups & base rates

A recent study reported in The Lancet on the risks of alcohol included a stark, headline-grabbing message: “Our results show that the safest level of drinking is none.”

The Lancet‘s press release highlights the biggest numbers, including that the yearly risk of developing twenty-three alcohol-related problems rises by 7% if you have two drinks per day, and by 37% if you have five per day.

This was the real attention grabber: Even one drink per day increases your risk of these problems.

So much for that one glass of wine a day extending your life, right?

The study got massive news coverage, not surprisingly given that a typical headline read, “No amount of alcohol is safe, health experts warn.”

Sure sounds scary. But this story turns out to be a great object lesson on Gerd Gigerenzer’s message at the beginning of Risk Savvy.

“Always ask for a reference class: Percent of what?”

Here’s the issue: If the rates are already really low for these diseases, large percentage increases can look scary even when the overall risk remains incredibly low.

If a disease occurs in 1 out of 100,000 people, a 100% increase in the occurrence of the disease means an increase to just 2 out of 100,000.

How you frame the data really matters.

New York Times article by Aaron Carroll, a pediatrics professor who writes about health research, makes exactly this point, digging down into what the data tells us.

Turns out the difference between no alcohol and one drink per day is tiny:

“For each set of 100,000 people who have one drink a day per year, 918 can expect to experience one of the 23 alcohol-related problems in any year. Of those who drink nothing, 914 can expect to experience a problem. This means that 99,082 are unaffected, and 914 will have an issue no matter what. Only 4 in 100,000 people who consume a drink a day may have a problem caused by the drinking, according to this study.”

(h/t Barry Ritholtz, whose tweet brought this to my attention.)

I recommend you look at Carroll’s analysis. As he points out at the beginning of the article, “The truth is much less newsy and much more measured.”

If you have five drinks a day, the “37% increase” in alcohol-related health problems documented in the study means your yearly chance is a little over 1-in-100 (1.252), compared with a little under 1-in-100 (.914) for non-drinkers or those who have one drink per day (.918) or two (.977).

I’m not questioning that alcohol can hurt you, and five drinks per day is worse than one or zero. And you probably shouldn’t have five drinks a day.

Whenever you see an unusually large percentage increase in anything, remember to dig further.

When the base rates are very low small differences can be spun into alarming headlines.

The latest on the replication crisis in social science

Colin Camerer and 23 colleagues from around the world just published, in Nature Human Behaviour, the results of attempts to replicate 21 studies appearing in Natureand Science between 2010 and 2015.

They found a significant effect in the same direction as the original study for 13 of the studies. In addition, the effect size of the replications was on average about 50% of the original effect size.

Whether this is good news or bad news I suppose depends on your point of view. (A good perspective on what this effort generally means for improving science is presented by Brian Resnick’s article on the attempted replications in Vox.com.)

Good news/bad news aside, a part of the replication effort jumped out at me: The attempted replications included a prediction market on the likelihood of each experiment replicating.

Turns out, asking scientists to bet on replication is a pretty good predictor of replication. According to Ed Yong’s article in The Atlantic focusing on the betting-market aspect of the findings, the traders’ overall pricing indicated that the studies would replicate 63% of the time, “uncannily close” to the actual 62%.

I mentioned in Thinking in Bets how this was a good debiasing technique. That’s literally what “thinking in bets” means.

The following graph illustrates how the market assigned a greater likelihood of success for the 13 studies that replicated, compared to the 8 that didn’t.

Economist Anna Dreber, who developed the idea of using prediction markets to study reproducibility of results, was involved in the project and said, “These results suggest something systematic about papers that fail to replicate.”

It’s an input into a forecast and it’s important to understand the difference

Several polls in Texas are showing that the Ted Cruz-Beto O’Rourke Senate race is close, even within one point, and this is generating a lot of buzz.

In this great read on FiveThirtyEight.comDhrumil Mehta and Janie Velenciapointed out that we might want to take a moment before breaking out the champagne or getting out the Kleenex.

Two months to go is a long time and the longer out an event is, the more uncertainty in any forecast based on a poll. We tend to think what is true now will be true in the future.

But the future is always uncertain and the farther out you look, the more uncertain it is.

The nine Texas polls, which appear in a table in the article, were conducted between June and August and all have Cruz leading, by margins of 1 to 10 percentage points.

If Cruz wins by 20 points – or loses by 20 points – I’m sure some people will complain about pollsters.

But the polls are just a snapshot of what people think at a certain moment in time. It would be silly to even speculate on all the things that can happen between the time of those polls and when people vote.

The FiveThirtyEight article does a good job of demonstrating this, including with a table of the average error of Senate polls taken in August 1990-2016:

There is so much uncertainty in what polls are telling us when they’re conducted that far out from the election. The next time someone declares something based on the polls, ask them, “Wanna bet?”

Watch them start to backtrack.

That’s the good thing about asking people to bet. It causes the uncertainty to bubble to the surface.

Catalyzing the field of youth decision making

The nonprofit organization I cofounded, HowIDecide.org, recently announced that it was setting more ambitious goals:

We want one of the core goals of every young person’s education to be the development of critical thinking and decision skills. We’ve learned that in order to reach that goal we need a new strong field in education – critical thinking and decision skills for youth. The field needs to generate public awareness, innovative programs, engaging instruction, policy commitments at the state and local level, professional development and support for teachers and administrators, and an ongoing commitment to identifying, coordinating, and amplifying all of those related efforts.

How I Decide is changing to become the catalyst for this field. Over the coming months, as we find new organizations to host or support our successful programs, How I Decide will shift to this new strategy. We’ll do this primarily by identifying, coordinating, and amplifying the work of others who are currently or potentially contributing. Our website, newsletters, events, and operations will all be shifting to support and highlight those efforts.

You can read more about the organization’s evolving mission to improve youth decision skills and plan to catalyze the field.


My friends Jon Haidt and Greg Lukianoff had their book published this week, The Coddling of the American Mind: How Good Intentions and Bad Ideas are Setting Up a Generation for Failure. It’s a great book on subjects of massive importance. Check out the great review in the New York Times. Or better yet, check out the book itself!

The Spinning Spiral Illusion, via RelativelyInteresting.com: