MANGLING THE MESSAGE –
Bloomberg.com drops the ball on communicating the probabilistic nature of World Cup predictions
Predictions are probabilistic in nature.
This means that, though the probabilities should play out over the long run, a single result doesn’t tell you much about the quality of the prediction or the predictor.
But we have trouble wrapping our heads around this idea.
We think the quality of a prediction about who might be favored to win a sporting event or an election depends on whether the favored result happens.
If the favored outcome doesn’t occur (or the disfavored one does), we say, “The prediction was wrong.”
I know this is a hard concept. But I wouldn’t expect @Bloomberg to mangle this one. Yet they ran a story with this headline:
You have to look at the table all the way at the bottom of the article to see that the result of the simulations was Germany winning 24% of the time.
I’m probably stating the obvious here, but “one-time-in-four” is not the same as “will win.”
And “winning one-time-in-four” is the same as saying “losing three-times-in-four.”
@CliffordAsness, manager and cofounding principal of AQR Capital Management, wasn’t alone in finding the message of this headline a head-scratcher.
If anything, the article tried to play up the certainty of the prediction:
- “If you had one chance to predict the winner of this summer’s World Cup, which team would you choose? …. But that’s just not quantitative enough for investment banks.”
- “UBS deployed a team of 18 analysts and editors, and ran a computer simulation of the tournament 10,000 times, in an effort to predict the likely winner of the tournament.”
- “This year’s UBS model comes wrapped in a comprehensive 17-page research note.”
We know, of course, that Germany subsequently lost to South Korea and failed to make the round of 16.
Woe unto UBS, which came in for plenty of criticism – not as much as Germany’s team, but still plenty.
@FortuneMagazine‘s article after Germany’s elimination came with a sub-headline declaring “Why Predicting Sports Is Bad Business.”
The message of the article is pretty much that it’s foolish to try predicting anything in sports or markets because you might be “wrong” on any one trial:
@FT dinged Goldman Sachs for predicting Germany would even make the finals, and called UBS’s prediction “spuriously precise”:
We have to resist ignoring that a prediction is about what will happen over time because we default to thinking that “whatever is most likely” means “certain.”
And if the most likely thing doesn’t happen, we declare that the prediction was wrong.
But given that the model predicted that Germany would lose 76% of the time, couldn’t all of these pundits instead have declared that the model was incredibly accurate since Germany did not win?
They could have declared that. But that declaration would have been just as silly, given that the prediction was probabilistic.
The point is that we can’t know if the prediction model is sound from just this one World Cup.
To determine that, we’d have to look behind the predictions to the accuracy/quality of the information used and the weight given to each piece of information.
Then we’d have to track the model’s predictions over time.
But we do know this much:
- 24% does not mean 100%;
- 24% chance to win means 76% chance to lose;
P.S. The best model for predicting World Cup matches is from Paul the Octopus(deceased), who “picked” winners 85.7% of the time by choosing among identical food boxes bearing the national flags of the competitors. (Caveat: skeptics might say Paul’s model was wrong and he picked winners just 14.3% of the time, because he may have been eating last from the box with the flag of his pick to win.)
HOW LIKELY IS “LIKELY”?
We should communicate our uncertainty …
… but we shouldn’t assume everyone means the same thing when they use common expressions of probability
If we have trouble getting across the meaning of “I think it’s 24% likely Germany will win the World Cup,” imagine how hard it is to be understood when we skip the percentages altogether and use descriptive language instead.
To get a feel for how much people differ in their understanding of what words denoting uncertainty mean, check out, “If You Say Something Is “Likely,” How Likely Do People Think It Is?” in @HarvardBiz, by Andrew Mauboussin (@AMaub) and Michael Mauboussin (@MHMauboussin).
Their survey of 1,700 respondents found a significant range in what people think the 23 most common expressions of probability mean:
The article also has some excellent historical examples, as well as advice for improving how we convey probability and uncertainty.
Lesson 1: Use probabilities instead of words to avoid misinterpretation.
For example, saying “in my simulations, Germany won the World Cup 24% of the time”, rather than “Germany is likely to win the World Cup,” or “in my simulations, Germany wins the World Cup most of the time,” or “Germany has a high probability of winning the World Cup.”
Lesson 2: Use structured approaches to set probabilities.
State probabilities as if they are betting propositions.
Lesson 3: Seek feedback to improve your forecasting.
Keep score. Make yourself and those around you accountable for predictions, both in expressing them properly and in evaluating them against results and other subsequently available information.
(You can take the survey and see how your understanding of these terms compares with others.)
THE PARADOX OF BENCHMARKING
We lack objective measures so as conditions change, the benchmarks change
This is true whether we’re judging happiness, threats, or even whether dots are blue or purple
This week, @DavidFrum tweeted out a surprising result from a @PewGlobal poll:
What measures – objective or even somewhat subjective – support that things were better in the U.S. in 1968 than now?
LGBTQ rights?
Women’s rights?
Civil rights?
Young male adults facing military conscription?
Widespread rioting in urban areas?
And how about just the life-improving advancements in technology, medicine, computers, and communication?
What could explain why so many people think that things have gotten so much worse in the last fifty years?
Well, @DanTGilbert and five colleagues came out with a provocative result about how our judgments are relative rather than absolute, as reported on June 28 in @ScienceMagazine that may help explain this finding.
The study was of simple design: subjects identify dots and sort them as blue or purple.
For the first 200 trials, half the dots were purple and half blue. Researchers subsequently showed fewer and fewer blue dots.
As the blue dots became rarer, subjects started to identify dots as blue that they previously saw as purple.
This result held up even when they warned subjects blue dots would become rarer.
And here is the kicker: They found the same shift when subjects judged whether faces were threatening.
The Abstract described the findings, and applied them to the vexing problem of why, as social conditions improve, people seem to be resistant to believing progress is being made:
As we go through our daily lives, trying to make decisions that maximize happiness and minimize threats, we lack absolute measures for judging these things.
This research suggests that as conditions improve, the benchmarks change, whether we’re judging happiness, a threat, or the color of a dot.
We are relative pricers; as the world changes, the comparison changes as well.
I suspect this is what is behind a lot of the criticism of @SAPinker’s Enlightenment Now.
Pinker has attempted to demonstrate the numerous ways that humankind’s arc of progress is continuing.
It’s an optimistic message and its broad strokes don’t seem controversial – fewer people starving, there is less poverty and disease, more people are literate, life spans are longer, there is less violence in the world – yet it seems to upset quite a few people.
Gilbert’s study gives us a clue as to why people might be upset by Pinker’s message.
The paradox of the progress that Pinker celebrates is that that progress changes our benchmarks.
People see and feel the pain and suffering in the world in comparison to today’s benchmarks, not the benchmarks of the past.
This helps explain the Pew result.
(Obviously, the influence of the study depends on whether it holds up and replicates.)
DR. ZIMBARDO RESPONDS TO CRITICISM OF THE STANFORD PRISON EXPERIMENT
The June 22 newsletter included an item, “New Evidence Invalidating the Famous Stanford Prison Experiment.” I summarized recently-presented information potentially affecting the validity of the results.
(Although some sources raising these criticisms have used terms like “fraud” or “lie,” I also mentioned something nearly two decades old that I’m surprised didn’t limit the influence of the study: the result failed to replicate.)
Professor Philip Zimbardo, who conducted the original study and has defended it over the years, replied to those critics, in a “Response to Recent Criticisms of the Stanford Prison Experiment.”
Because I recommended that you read the sources criticizing the SPE, I want to also recommend that you read Zimbardo’s Response.
It’s a famous study. It influenced millions.
Obviously, the quest for scientific inquiry requires seriously considering the evidence supporting and refuting the results and findings of the Stanford Prison Experiment and Zimbardo deserves to have a voice in the debate.
Lee Jussim’s short article in @PsychToday on the controversy is also worth looking at.
He has long refused to teach it in introductory psych classes. “I do not know if the study is an outright fraud, but I lack the words to describe how much it has been oversold.”
His reservations aren’t so much about recent revelations but about drawing scientific conclusions from such a small and subjective study that has not been replicated by others.
Following this ongoing debate is a great window into how to evaluate and interpret scientific results as well as how diverse viewpoints contribute to debate.
THIS WEEK’S ILLUSION
M.C. Escher’s “Day and Night”
H/t @Brooklyn_Paper, which reported on the opening of an Escher exhibit (featuring over 200 items) in Brooklyn at Industry City. The exhibit runs until February 2019.