Satchit Balsari, MD (Emergency Medicine, BIDMC) writes about how to sort reliable COVID-19 data from the misleading.

Which Covid-19 data can you trust?

Harvard Business Review – May 8, 2020

The Covid-19 pandemic has created a tidal wave of data. As countries and cities struggle to grab hold of the scope and scale of the problem, tech corporations and data aggregators have stepped up, filling the gap with dashboards scoring social distancing based on location data from mobile phone apps and cell towers, contact-tracing apps using geolocation services and Bluetooth, and modeling efforts to predict epidemic burden and hospital needs. In the face of uncertainty, these data can provide comfort — tangible facts in the face of many unknowns.

In a crisis situation like the one we are in, data can be an essential tool for crafting responses, allocating resources, measuring the effectiveness of interventions, such as social distancing, and telling us when we might reopen economies. However, incomplete or incorrect data can also muddy the waters, obscuring important nuances within communities, ignoring important factors such as socioeconomic realities, and creating false senses of panic or safety, not to mention other harms such as needlessly exposing private information. Right now, bad data could produce serious missteps with consequences for millions.

Unfortunately, many of these technological solutions — however well intended — do not provide the clear picture they purport to. In many cases, there is insufficient engagement with subject-matter experts, such as epidemiologists who specialize in modeling the spread of infectious diseases or front-line clinicians who can help prioritize needs. But because technology and telecom companies have greater access to mobile device data, enormous financial resources, and larger teams of data scientists, than academic researchers do, their data products are being rolled out at a higher volume than high quality studies.

Whether you’re a CEO, a consultant, a policymaker, or just someone who is trying to make sense of what’s going on, it’s essential to be able to sort the good data from the misleading — or even misguided.

Common Pitfalls

While you may not be qualified to evaluate the particulars of every dashboard, chart, and study you see, there are common red flags to let you know data might not be reliable. Here’s what to look out for:

Data products that are too broad, too specific, or lack context. Over-aggregated data — such as national metrics of physical distancing that some of our largest data aggregators in the world are putting out — obscure important local and regional variation, are not actionable, and mean little if used for inter-nation comparisons given the massive social, demographic, and economic disparities in the world.

Conversely, overly disaggregated data can do outright harm. Public health practitioners and data privacy experts rely on proportionality — only use the data that you absolutely need for the intended purpose and no more. To some extent, all data risk breaching the privacy of individual or group identities, but publishing scorecards for specific neighborhoods risks shaming or punishing communities, while ignoring the socioeconomic realities of people’s lives that make it difficult for them to stay home. Even more granular examples, such as footfalls at identifiable business locations, risks de-identifying religious groups; patients visiting cancer hospitals, HIV clinics, or reproductive health clinics; or those seeking public assistance. The medical and public health communities long ago deemed the un-masking of such information without consent unacceptable, but companies have recently been releasing it on publicly available dashboards.

Even data at an appropriate spatial resolution must be interpreted with caution — context is key. Say you see a map that shows a 20% decrease in mobility in an American suburb and a 40% decrease in a nearby city after social distancing measures are announced. The decrease in the suburb may adequately push physical distancing to below the desired threshold, given that its residents started with a relatively low baseline to begin with. The city may still be far away from the mobility reduction required to meaningfully impact transmission rates, as its residents were very mobile before. Until we know more about how these changing movement patterns impact epidemiological aspects of the disease, we should use these data with caution. Simply presenting them, or interpreting them without a proper contextual understanding, could inadvertently lead to imposing or relaxing restrictions on lives and livelihoods, based on incomplete information.

The technologies behind the data are unvetted or have limited utility. Tech solutions such as mobile phone-based contact tracing — a solution gaining steam in many countries — have untested potential, but only as part of a broader comprehensive strategy that includes a strong underlying health system. Jason Bay, the product lead of Singapore’s successful tracing app, TraceTogether, cautions that “automated contact tracing is not a coronavirus panacea.” Yet some app-based contact-tracing efforts are being used to risk-stratify people, and these estimates are being used to make decisions on quarantine, isolation, and freedom of movement, without concomitant testing.

Both producers and consumers of outputs from these apps must understand where these can fall short. They may prove to be very useful if we experience recurrent waves in the coming months, when the outbreaks may be more localized, and our testing capacity commensurate with our technological aspirations. In the absence of a tightly coupled testing and treatment plan, however, these apps risk either providing false reassurance to communities where infectious but asymptomatic individuals can continue to spread disease, or requiring an unreasonably large number of people to quarantine. The behavioral response of the population to these apps is therefore unknown and likely to vary significantly across societies.

In some cases, the data from tracing apps requires another caveat: the methods they use are not transparent, so they cannot be fully evaluated by experts. Some contact-tracing apps follow black-box algorithms, which preclude the global community of scientists from refining them or adopting them elsewhere. These non-transparent, un-validated interventions — which are now being rolled out (or rolled back) in countries such as China, India, Israel and Vietnam — are in direct contravention to the open cross-border collaboration that scientists have adopted to address the Covid-19 pandemic. Only transparent, thoroughly vetted algorithms should be considered to augment public health interventions that affect the lives of millions.

Models are produced and presented without appropriate expertise. Well-meaning technologists and highly influential consulting firms are advising governments, and consequently businesses and general populations around the world, on strategies to combat the epidemic, including by building projection and prediction models. Epidemiological models that can help predict the burden and pattern of spread of Covid-19 rely on a number of parameters that are, as yet, wildly uncertain. We still lack many of the basic facts about this disease, including how many people have symptoms, whether people who have been infected are immune to reinfection, and — crucially — how many people have been infected so far. In the absence of reliable virological testing data, we cannot fit models accurately, or know confidently what the future of this epidemic will look like for all these reasons, and yet numbers are being presented to governments and the public with the appearance of certainty

Take a recent example: A leading global consulting firm explained their projections for an east-coast American city, by overlaying on it what they referred to as “the Wuhan curve.” The two populations and cities could not be more different in their demography and health care infrastructure. Such oversimplifications risk inaccurate projections and the untimely diversion of critical resources from places that need them the most. Corporations have the vast resources required to rapidly translate the knowledge generated from their data and technologies to governments and communities, but are crowdsourcing expertise from within their ranks. While it can be tempting to want to move with speed, a rapid “move fast and break things” approach — the hallmark of our startup culture — is inappropriate here. Coupling this enthusiasm with the right kind of subject matter expertise may go farther.

Read Carefully and Trust Cautiously

Relying on trustworthy sources is always good advice, but now it is an absolute must. Here are some buoys to help you navigate your way to the shore, whether you are a producer or consumer of data.

Transparency: Look for how the data, technology, or recommendations are presented. The more transparent the providers are about the representativeness of their data, analytic methods, or algorithms, the more confident they are of their process, and more open to public scrutiny. These are the safest knowledge partners.

Example: Singapore’s government was entirely transparent about the code, algorithm, and logic used in its TraceTogether contact app. While launching the app, they openly published a policy brief and white paper describing the rational and working of the app, and most importantly, their protocol (“BlueTrace”) and codebase (“OpenTrace”), allowing open review.

Thoughtfulness: Look for signs of hubris. Wanton disregard of privacy, civil rights, or well-established scientific facts belie overconfidence at best, and recklessness at worst. These kinds of approaches are likely to result in the most harm. Analysts that are conservative in their recommendations, share the uncertainty associated with their interpretations, and situate their findings in the appropriate local context are likely to be more useful.

Example: Telenor, the Norwegian telco giant has led the way in responsible use of aggregated mobility data from cell phone tower records. Its data have been used, in close collaboration with scientists and local practitioners, to model, predict, and respond to outbreaks around the world. Telenor has openly published its methods and provided technological guidance on how telco data can be used in public health emergencies in a responsible, anonymized format that does not risk de-identification.

Expertise: Look for the professionals. Examine the credentials of those providing and processing the data. We are facing a deluge of data and interpretation from the wrong kinds of experts, resulting in a high noise-to-signal ratio. On the most bullish of days, we wouldn’t want our bankers to be our surgeons.

Example: Imperial College, among other academic groups, has been involved in guiding decision makers in the U.K. Covid-19 response since the early days of the epidemic, through the work of the MRC Center for Global Infectious Disease Analysis. In the U.S., longstanding collaborations between state and local health departments and research groups have been augmented by new collaborative partnerships. In both countries, these efforts critically rely on sustained funding of centers that can support methods development and training during inter-epidemic periods and mobilized to respond when crises hit.

Open Platforms: Look for the collaborators. There are several data aggregators that are committed to supporting an ecosystem of communities, businesses, and research partners, by sharing data or code in safe and responsible ways. Such open ecosystem approaches, while not easy to manage, can yield high dividends.

Example: Where technology companies like Camber Systems, Cubeiq and Facebook have allowed scientists to examine their data, researchers can compare data across these novel data streams to account for representativeness and correct biases, making the data even more useful. The Covid-19 Mobility Data Network, of which we are part, comprises a voluntary collaboration of epidemiologists from around the world analyzes aggregated data from technology companies to provide daily insights to city and state officials from California to Dhaka, Bangladesh. Governments convey what information gaps exist in their planning and policy making, the scientists help identify the best analytic approaches to address those gaps, and the technology companies make available the data they have access to in a meaningful, interpretable format. All data exchange follows strict institutional ethical guidelines and is in compliance with local and international law. Daily outputs speak to the articulated needs of the collaborating government officials.

This pandemic has been studied more intensely in a shorter amount of time than any other human event. Our globalized world has rapidly generated and shared a vast amount of information about it. It is inevitable that there will be bad as well as good data in that mix. These massive, decentralized, and crowd-sourced data can reliably be converted to life-saving knowledge if tempered by expertise, transparency, rigor, and collaboration. When making your own decisions, read closely, trust carefully, and when in doubt, look to the experts.

Satchit Balsari, MD (Emergency Medicine, BIDMC) writes about how to sort reliable COVID-19 data from the misleading.

Which Covid-19 data can you trust?

Common Pitfalls

Read Carefully and Trust Cautiously

Recent Stories