The story made me do it Scapegoating data storytelling to cover bad data analysis

Now and then a data analyst pipes up with a warning about storytelling. It’s a menace, they say. It’s so potent, they say, that it stands to corrupt even data, the twenty-first century’s supposedly immaculate compass.

Most recently, that corruption touched even a hackathon involving Canadian women’s rugby, as documented in a recent blog post. In “The Dangerous Allure of the Narrative,” Oscar Goodloe spells out how narrative seduced him and his fellow hackers.

Toward the end of a “lighthearted hackathon,” he and his data analytics team realized their results were invalid. They blamed the irresistible influence of storytelling.

The competition began as competing analytics teams received a set of data. Within that data, Goodhoe and his team identified one variable as the strongest predictor of injury: “degree of soreness,” as reported by each athlete on a scale of zero to ten. Naturally, the analytics team said, higher values signified greater soreness. A ten of course meant “ouch!”

They combined data for the soreness variable with data they’d found that ranked teams by strength. A simple bivariate scatter plot showed a surprising correlation: the opposing team’s anticipated strength in an impending match tended to be in inverse proportion to each players’ reported soreness. For example, if the next opponent was ranked stronger, the players’ average self-reported soreness tended to be in the lower half of the scale. Stronger opponent, lower soreness.

That seemed odd to the data hackers. Why would players feel stronger ahead of tougher games? The reason, they decided, was that the self-reported soreness wasn’t the actual soreness felt. But why? What would make these athletes deliberately underrate soreness? The reason, they hypothesized, must be that lower soreness might improve the chance the coach would entrust the player with an important role in the game plan.

That’s the story the data hackers would present, but the story was wrong. The data hackers had made a fundamental error in their analysis. They had interpreted the zero-to-ten scale backwards. They had assumed that the scale should be read like the usual zero-to-ten pain scale, with zero meaning no pain and ten meaning a ton. They had failed to check their interpretation of the data with the coaching staff, who had designed the soreness scale and harvested the data.

As Goodhoe tells it, the hackers discovered the error the day they were to present their findings. Did the analytics team persist and tell their intended story anyway? Or did they do the courageous thing and stand down? He doesn’t say — though he does go on to disparage storytelling for its intoxicating sway.

Storytelling is indeed a strong influence on data analysts — so strong that it can corrupt data analysts. Storytelling is so strong that it stands in the same league as money, sex, and power in its ability to corrupt. But that’s only a reason to use it wisely, not to avoid it.

“The Dangerous Allure of the Narrative — How a desire for strong data storytelling can bite you.” Towards Data Science, April 1, 2021

Reader Interactions

Leave a Reply Cancel reply