Subscribe to Annie's Substack

“What Does the Raw Data Say About Sexism By Tennis Refs” — Annie’s Newsletter, Sept 21, 2018

Charles Duhigg

It’s a trick question: RAW Data doesn’t SAY anything
WE collect and interpret it, and WE are fallible

Naomi Osaka’s victory in this year’s U.S. Open was overshadowed by Serena Williams’ dispute with umpire Carlos Ramos and questions about sexism and unequal treatment.

Immediately after the match, several commentators pointed out incidents where male players had ranted at Ramos in Grand Slam matches without receiving a game-penalty, which he gave Williams.

Sports columnist Sally Jenkins, in the Washington Postwrote:

“Ramos has put up with worse from a man. At the French Open in 2017, Ramos leveled Rafael Nadal with a ticky-tacky penalty over a time delay, and Nadal told him he would see to it that Ramos never referee one of his matches again.”

That sounds pretty similar. I felt pretty convinced by it. Until I reminded myself that anecdote does not equal data.

A few days later, some data did circulate around Twitter. But that data came with its own trap.

Christopher Clarey of the New York Times argued in his piece, “Are Women Penalized More Than Men in Tennis? Data Says No”, that the numbers show no sexism in giving penalties in tennis. The table below, from the piece, shows the number of times men have been fined compared to women for a variety of offenses in Grand Slams from 1998 to 2018.

Men are fined more than women in nearly all categories.

A lot of folks considered this convincing evidence disputing the sexism charge.

Journalist Glenn Greenwald echoed a sentiment shared by a bunch of people seeing the article. Don’t fall for the anecdote. See the data? Not sexist.

He tweeted:

“Now, NYT just released a study of the actual data: contrary to that narrative, male tennis players are punished at far greater rates for misbehavior, especially the ones relevant to that controversy: verbal abuse, obscenity, and unsportsmanlike conduct.”

But the data, as presented, tells us no such thing. It neither supports the sexism charge nor refutes it.

Understanding why is of the utmost importance because we need to be able to discern when a clickable headline doesn’t comport with the data, what is fact and what is hype.

Guess what? The problem comes back to reference class. What are the base rates? What exactly are we comparing?

Nate Silver (and others making a similar points, including Lee Jussim), made just this kind of complaint about the conclusions people drew from the data.

This was Silver’s tweet responding to Greenwald and the table:

“The study doesn’t show that. It shows that male players are fined more, but that could be because they misbehave more. (Indeed, from watching a fair bit of tennis, the men do misbehave more). This data doesn’t tell us anything about whether they’re punished at greater rates.”

Look at the table.

At first glance it looks so convincing because the raw numbers are so disparate. I mean, men have been fined 1,517 times compared to only 535 fines for women.

That seems definitive that refs fine men at a greater rate, right?

No, because of what is missing from the table. 

Here are a couple of key pieces of information that are missing:

    • How many fines do the players receive per minute on the court? Men are on the court much longer in Grand Slam events. Women play best-of-three sets. Men play best-of-five. I don’t know the fines-per-minute for men and women but the men got their 1,517 fines over a lot more court time than the women. Without knowing how much longer the average men’s match is than the average women’s match, these data don’t tell us much.


  • To Nate Silver’s point, the table doesn’t tell us the number of instances of conduct meriting a fine for men vs. women. What we really need to know is how often men and women could have been fined, and how often they actually were fined.

For example, the table shows that men received 344 audible obscenity fines, compared with 140 for women.

That’s the total number of fines, not the percentage of audible obscenities that were fined.

Consider this hypothetical:

What if someone recorded audio of all those matches, counted all the audible obscenities, and found 1,000 audible obscenities leading to 344 fines for men, and 150 audible obscenities leading to 140 for women.

That would mean that referees fined men for 34% of their audible obscenities, while fining women for the same offence 93% of the time.

That would be pretty sexist.

But the data do nothing to clear up the sexism charge (or lack of sexism assertion) because we have no idea what the size of the reference class is and no idea what the base rates of the behaviors are.

Just as the anecdotes don’t support the sexism charge, the data from Clarey don’t dispute it.

Either way, I highly recommend this great opinion piece by Martina Navratilova, who I think had the best take of all on the incident.

They measure different things 
And their data is useless if misunderstood, misinterpreted, or disregarded

A couple of newsletters back, I posted an item titled, “A Poll is Not a Forecast,” about an excellent article addressing the limitations of how predictive polls taken 2-3 months before an election are. 

Coincidentally, right after writing about Five Thirty Eight’s thoughtful piece, I came across a piece on that was like a sledgehammer in its blunt misunderstanding of the differences between polls and forecasts.

The piece on CNBC is from July 2016, shortly after the Brexit referendum and three months before the presidential election.

The piece declared that for the Brexit vote, “the polling proved more accurate than the political bettors.” (Opinion polls showed a close race; betting markets showed a wide lead for Remain.)

What? Talk about completely missing the difference between a poll and a forecast.

You can see my mini tweet storm about the piece here.

Here’s a summary of what I said:

A poll is a mock vote, a snapshot of one moment in time. It’s not a prediction. It’s a piece of data, which someone can input into a model for a prediction. A bet is the outcome of that model.

The betting odds are a prediction of what the outcomes will be over many trials. A poll is the outcome of a single survey.

Close results in a poll immediately before an event should mirror close results in the vote. And even a small difference that is outside the margin of error that shouldresult in a large gap in a betting market.

Polls are more predictive the closer to the event they are. With Brexit, if Remain polled at 60% many months out, your confidence that Remain would win is much lower than if it polled at 60% the hour before, in which case you might forecast that Remain would win close to 100% of the time assuming your polling methods were sound.

It makes me sad to see articles like this one because it perpetuates misunderstanding what polls and betting markets are.

Polls and betting markets reflect different things and we can’t assess the quality of either based on a single outcome.

Misinformation like this makes it easier for people to misinterpret the data, ignore it altogether, or declare that the polls are FAKE NEWS.

And the responsibility to correct or retract mistakes and misimpressions

David French, an editor at the National Reviewwrote an article recently in which he did something rare and admirable in public life:

He admitted he was wrong about the way he had been writing about police shootings.

“To put it bluntly, when I look back at my older writings, I see them as contributing more to a particular partisan narrative than to a tough, clear-eyed search for truth.”

His earlier writing had focused on elements of more extreme claims to discredit the overall allegation of systemic racism.

Yes, I used all the proper ‘to be sure’ language – there are some racist cops, not every shooting is justified, etc. – but my work in its totality minimized the vital quest for individual justice, the evidence that does exist of systematic racial bias, and I failed to seriously consider the very real problems that contribute to the sheer number of police killings in the U.S.”

Whether you agree or disagree with his position, take note of his open-mindedness.

We are all remarkably stubborn about changing our beliefs.

It’s even harder to change a belief that is part of our identity.

This difficulty gets multiplied when you have so publicly taken a position, when you’ve expressed an opinion and emphasized its soundness on multiple occasions in public for all to see.

Despite all that, David French not only kept an open mind about his belief but announced that he was updating his belief. He took responsibility for contributing to a viewpoint he now realized was incorrect and explained why.

THAT’S what makes the French piece so remarkable.


Steven Johnson makes an excellent case, in a recent Medium article, that “Decision-Making Should Be a Required Course in Every High School.”

Reflecting on his education, he begins with a fact that’s surprising because it’s obvious, universal, and generally unnoticed:

“In all those years at school, not once did I take a class that taught me how to make a complex decision, despite the fact that the ability to make informed and creative decisions is a skill that apples to every aspect of our lives.”

Johnson also outlines the curriculum for a course in decision-making, noting the added benefit that the subject matter exposes students to a variety of disciplines, both theoretically and practically (economics, philosophy, psychology, neuroscience, history, politics).

Obviously, I’m a proponent of teaching children about decision-making. That’s why I cofounded How I Decide.

I encourage you to check out the article and join this effort. You should also get Johnson’s recently-published book, Farsighted: How We Make the Decisions That Matter the Most.


Johnson’s new book discusses the work of Phil Tetlock and Dan Gardner, including Tetlock’s earlier academic book Expert Political Judgment and Tetlock & Gardner’s Superforecasting.

Gardner, on Twitter, pointed out some misstatements Johnson made about their work in Farsighted. Although Johnson favorably describes their research and uses the findings, he conflated the research from the two books, inaccurately characterizing the success of different forecasting approaches.

Johnson immediately apologized and promised to correct it.

Completely coincidental to my earlier item about the importance of updating beliefs and taking responsibility for improving understanding of facts and opinions. But another cool example of someone apologizing and taking responsibility to correct.

Anyone know the answer under bird law?

Weird. That doesn’t look like a black swan.