Statistical Fallacies Surrounding COVID-19

by Alvaro Cruz   Mar 27, 2020

The 2% to 3% mortality rate given for COVID-19 can provide the public with a false sense of security. It is important to factor its ability to infect larger and larger groups of people.

Generally, a statistic is the measure or attribute of a collection of numbers or data. Statistical literacy is a reader’s ability to understand and interpret what a statistic is meaning to convey. This is the challenge and quest for quantitative professionals, to best represent data findings legibly and the statistics interpretable and clear to the broadest audience. Unfortunately, statistics can be bent to convince people of skewed ideas. There is a quote notoriously attributed to Mark Twain and others, often used when referring to the use of numbers as evidence for weak arguments, “There are three types of lies: lies, damned lies, and statistics.” 

Data Misinterpretations Regarding COVID-19

Data Misinterpretations Regarding COVID-19

Currently, there are misinterpretations regarding COVID-19, more egregious being the minimization of its single-digit mortality rate compared to other viruses like Ebola, which has roughly a 50% mortality rate. Several numbers ranging between 2% to 3% have been given for COVID-19, and the small value can provide readers with a false sense of security. Yet this is not how probability works! Probability is the quantitative representation of a specific possibility. The possibility here is death, and the hope is for a group of people who are infected, not necessarily an individual. One can conceptualize a group of size one (an individual) having a 2% to 3% probability risk of dying if infected, and this might be the way many people minimizing the risk of COVID-19 are thinking.

Nonetheless, a more valid interpretation is to assume that if a group of people says one million people get infected, anywhere from 20,000 to 30,000 would perish. The number of expected deaths then increases given more substantial and more significant populations of people of infected, say the whole United States, all 325 million people or so, or most of the world’s population, which is over 7 billion people. A single-digit mortality rate, when erroneously interpreted individually and independently of other people, quickly becomes dangerous when we factor its ability to infect larger and larger groups of people.

Unlike Ebola, where people are only infectious if they show symptoms, COVID-19 is dangerous in its contagion. In one recent study examining the outbreak in China, researchers estimated that roughly 86% of cases were asymptomatic. Another study estimates every infected person can, in turn, infect two to three others at current transmission rates. If 100 people are affected, 14 are symptomatic and kept separate from others. Yet the other 86, if not under lockdown, can roam and unknowingly infect countless others, further stressing the healthcare systems in place in the near future. This is why the measures restricting contact between large groups of people are being implemented, preventing mass contagion and overburden of healthcare systems worldwide. This is the purpose of flattening the curve strategy to reduce new infections and future deaths drastically. 

Another tip to keep in mind when reading about COVID-19 is the face value of the observable data and numbers and what it may mean about the latent, or not-so-visible factors pressing them. For example, in many countries, cases and deaths are noticeably higher among men than women. We have initial evidence suggesting a more significant association between morbidity and men regarding COVID-19. Yet, it is still early to claim anything beyond what the numbers say, which is men in some countries have higher counts of infection and death. For example, lifestyle differences such as higher smoking rates and differences in immune responses may be playing into these numbers. And note that smoking prevalence and differences in immune responses are sociocultural and biological factors, respectively, both disparate from each other. It is good practice to consider the unobservable factors that may be behind the observable numbers and not assume cause and effects unless otherwise stated and explicitly proved by research. 

The lockdown measures implemented worldwide aren’t so much about individual safety as it is of people everywhere, especially those most vulnerable. It is necessary to give healthcare providers the best possible chance to manage and deal with new cases and to prevent the number of deaths from rising.

Alvaro Cruz, author at OpenLoans
Data Scientist
Alvaro is a data scientist at Plat.ai and has in the past also worked as a researcher and instructor.