Thursday, May 27, 2021

Data Detective: Ten Easy Rules to Make Sense of Statistics Tim Harford 2021 Riverhead Books

For those intimidated by math and anything mathematical, this book presents guidelines for readers to overcome mental phobias when confronted with statistical data. Filled with examples from literature of all types, Harford instructs the reader on how to approach statistics. Harford begins by attacking head-on the idea of lying with statistics, the antithesis of his book. Next with his first rule, he defined the basic mindset of any reviewer of statistics--one free of personal bias. The attempt to be objective is the first step for anyone who wants to evaluate the validity of statistical or any kind of data. Second, he advised that individuals understand the perspectives from which they view the numbers in front of them, from a close-up and personal vantage point or a distant and wider angle. As we have seen with the Covid-19 data worldwide, different countries record infection rates and death rates differently. Understanding the differences in methodology helps to avoid what Harford calls "premature enumeration" (p. 65). He used the example of the difference between miscarriages and premature deaths. How each gets defined depends on the country's health criteria. In the chapter, Step Back and Enjoy the View, Harford encouraged us to view statistical data in terms of the broader trends rather than accepting information in isolation. He made the comparison of "rolling business coverage of Bloomberg TV, the daily rhythm of the newspaper the Financial Times. . . and the weekly take of The Economist--each operates on unique information cycles. The fifth rule, "Get the Backstory" highlights the bias of what gets published in scientific journals and what gets omitted, the peer-reviewed "publication bias" . . .[i]nteresting findings are published; non-findings, or failures to replicate previous findings, face a higher publication hurdle" (p.113). The reader has the task of discerning the population sample and sample bias, "Ask Who Is Missing" constitutes the sixth rule. With the focus currently on the disaggregation of data, readers can target to which group the study applied to and to which group the data does not apply. Even the aspiration of N=All, an attempt for an all-inclusive population, will contain only those who conform to the criteria of inclusion. Harford concludes with one certainty. If algorithms are shown in a skewed sample of the world, they will reach a skewed conclusion" (p. 151). Continuing with the discussion of algorithms, Harford instructed the reader to ask the following questions when confronted with conclusions from big data: "Are the underlying data accessible? Has the performance of the algorithm been assessed rigorously--for example, by running a randomized trial to see if people make better decisions with or without algorithmic advice? Have independent experts been given a change to evaluate the algorithm? What have they concluded?" (p. 183). Within the chapter, he explained the difference between causation and correlation. "Figuring out what causes what is near-impossible, some say. Figuring out what is correlated with what is much cheaper and easier" (p. 156). Rule Eight, "Don't Take Statistical Bedrock for Granted", Harford warned that the data from such agencies as the Bureau of Economic Analysis, the Bureau of Labor Statistics, the Census Bureau, the Federal Reserve, the Department of Agriculture, and the Energy Information Administration provide the "nation's statistical bedrock" (p. 190). Here Harford defends the products of his profession. The last two chapters admonish us to delve carefully into the data and "Remember that Misinformation Can Be Beautiful Too" (p. 213) and that through all our investigation, we should keep an opened mind.