Before 2020, outbreak analytics was on the fringes of data science studies. It was a niche not often called upon, but when COVID-19 began its quick global spread, the public, policymakers, and scientists alike looked to outbreak analytics to better understand the scope of the virus.
At its core, epidemiology and outbreak analytics focus on utilizing all available data to build models and allow evidence-based decision-making. It is an interdisciplinary field whose ultimate goal is to provide answers to public health crises in real time. This is achieved through in-depth analyses of raw data to build epidemiological models, which can then inform public health officials and policymakers.
The field of outbreak analytics isn’t as old as one might think. Prior to COVID-19, there have only been a few notable instances of in-depth epidemiological studies–Middle-East Respiratory Syndrome coronavirus (MERS-CoV), Zika, and the West African Ebola virus disease (EVD)–and none on the scale of COVID-19 studies in terms of urgency and importance in guiding global policy.
The first and most important step in data analysis is exploration, wherein epidemiologists work on visualizing data and generating summary statistics. The first graphic created is the epi curve, which shows case incidence over a given time interval.
Next, epidemiologists create maps. These maps are used to visualize the ‘ecological niche’ of infectious diseases and strategize intervention based on disease distribution.
Many conversations among outbreak analysts surround data capture and what tools can be utilized to make the process easier, quicker, and more accurate. In recent years, cloud computing, mobile data collection, and automated data analyses and reporting have advanced data collection capacity.
Dr. Lauren Ancel Meyers, a mathematical biologist at the University of Texas, Austin, says, “a lot of new thinking, new methods” has come about due to COVID-19. “I would venture that we’ve probably progressed as much in the last 10 months as we have in the prior six years.”
Researchers predict that as technology makes real-time sequencing a standard, genetic analysis will likely emerge as an important tool in studying outbreak analytics. This line of inquiry would likely offer insight into pathogenesis, risk stratification, and response to vaccination. Understanding how different populations are affected by applying this understanding to diverse, under-studied global populations is vital to stopping outbreaks on a global scale.
Outbreak analysts construct epidemiological models to understand a virus further as it develops to reduce harm in the now and prepare for a range of possible futures. However, the insights we can glean from outbreak analytics are still severely limited. The COVID-19 pandemic has made the limits of epidemiology abundantly clear.
In studying COVID-19, epidemiologists have run up against several roadblocks. Some are due to inherent challenges in data collection, while others are due to limited resources. Data collection is made even more challenging in areas where resources and funding are limited. In these situations, the data is there – it just can’t be easily accessed.
Surveillance and accurate reporting are vital in this process but have been severely lacking on both fronts. Epidemiologists have cited forecasting methods as a major dilemma. It has thus far been difficult to methodically gather spatial information about population flows and integrate them into existing transmission models. It is also a challenge to combine different types of data into transmission trees. Many in the field advocate for transparency and availability through freely available, open-source software to ease issues of accessibility.
Other limitations have arisen due to government intervention. One data scientist studying outbreak analytics, Rebekah Jones, found herself in hot water when she attempted to spread information about COVID-19’s presence in her home state of Florida. Jones claims that she was unlawfully targeted by the FBI and fired from her job for refusing to manipulate the true number of detected cases in Florida. This, she claims, was to suppress her from speaking out against how Florida Governor Ron DeSantis is handling the state’s COVID-19 outbreak.
Epidemiological models are built on data collected from the initial data collection phase. There are three different models: stochastic, deterministic, and SIR. These are called “compartmental” models. Epidemiologists integrate different metrics to create variations of these three basic models.
There are numerous models available, each with its own findings. Different countries and regions have been utilizing different models. The ICL model developed at Imperial College London and the IHME model, developed at the Institute of Health Metrics, are the two prominent prediction models for the U.S., U.K., and Australia. A new stochastic model has been developed in China, which aimed to “account for transmission dynamics and capture the effects of intervention measures in Mainland China.” Meanwhile, South Africa, the epicenter of the continent’s COVID-19 outbreak, has relied on a SIR model.
The ultimate goal of outbreak analysis is to provide policymakers with a real-time projection of an outbreak’s status. Data is collected, and models are built so that policymakers can make informed decisions about how to proceed. This is the third phase of outbreak analytics: intervention. Intervention planning begins shortly after case detection, and a risk assessment are completed. This is followed by surveillance during the planning stage, which will guide decisions when implemented. Models continue to be built throughout the intervention phase as new information is discovered and real-time surveillance is studied.