COVID-19 Counts and Curves – A Developing Case Study in Data Classification and Normalization Issues

It is heartening to hear Ontario’s Premier Doug Ford explain that “we must listen to what the data tells us” about the threat of the novel coronavirus. Commitments from politicians to evidence-based decision-making are refreshing, even though it is well understood that the data (a plural word) do not actually speak to us, unless we ask the right questions of them. In the case of COVID-19, numerous analysts – myself included – have been playing with ways to visualize, interpret, and even predict the curves of confirmed infections, tests conducted, deaths, and cases resolved. Unfortunately, it is becoming increasingly clear that the underlying data are fundamentally flawed and should not be used for public information nor for executive decisions that drastically interfere with our freedoms to live a healthy life, move around, assemble, or conduct business.

number of fatalities
case-fatality rate = ———————————
number of cases

The release of Ontario’s COVID-19 prediction models on April 3rd based on data collected up to the previous day, reported a high case-fatality rate of 2.1%, as 67 deaths were counted against 3,255 confirmed cases of the disease. In Italy, the same metric is pegged at a staggering 10% as of late March, i.e. one in ten infected are dying. This would explain Premier Ford’s characterization of SARS-CoV2 as “this terrible, terrible virus” and the widespread fear, as seen e.g. in my Facebook feed, of getting infected by the “deadly virus”. Media attention has recently turned to the “German exception” (New York Times, April 4th), where the case-fatality rate had been a low 0.2% in mid-March, although it has now risen to 1.6%. The key factor influencing the rate was identified as the extensive testing regime in Germany, which resulted in the detection of more mild cases of COVID-19 than elsewhere, and thus a lower ratio of fatalities to confirmed cases. In other words, if other countries conducted more tests they would also find more infected people with moderate, mild, or no symptoms at all, thereby reducing the ratio of fatalities to cases.

Who would have thought that high school math could be this important?

Our governments’ and epidemiologists’ main concern is the exponential growth of infections and the resulting need for hospitalizations and intensive-care beds. Like everybody else, I have been looking out for daily updates of confirmed infections and death tolls. Both have been growing exponentially in most countries worldwide and the proportion of people who know what a logarithmic scale is must have multiplied too. But there is a catch: infections are confirmed only among those who are tested, and the scarce testing resources in most countries are focused on health-care workers, hospitalized patients, and those with symptoms. Despite this focus, it was noted that confirmed cases are stabilizing at around 10% of those tested. In other words, the growth of COVID-19 cases could be due entirely, or in part, to the increasing number of tests conducted. And more speculatively, it is currently possible that the disease does not actually grow but that it is only the confirmation of cases among an already infected population that grows.

Extract from April 6 report by the Italian Istituto Superiore di Sanita,

In addition to the case-fatality rate’s denominator being under-estimated, there are now increasing questions about the accuracy of its numerator, the death count. The April 6 report by the Italian COVID-19 Surveillance Group notes that 96.7% among 1,290 hospitalized “COVID-19 positive deceased patients” had one, two, three or more diagnosed comorbidities, including cardio-vascular diseases, diabetes, kidney failure, chronic lung disease, and/or several other severe illnesses. This raises the question of the causal effect of SARS-CoV2 on the “corona deaths”, or how many people actually die from COVID-19 as opposed to dying with COVID-19. The German infectious disease agency Robert Koch-Institut acknowledged that anyone who dies with a confirmed SARS-CoV2 infection is considered a corona death, irrespective of the cause of death. This would obviously result in a vast over-estimation of the COVID-19 mortality count and thus the virus’ deadliness. On the other hand, the case-fatality rate may also be under-estimated since we tend to relate the death count to the current case count instead of the lower case count from the earlier time when the deceased got infected.

England and Wales statistics re COVID-19 and annual all-cause mortality, from

All these issues suggest benchmarking the alarming COVID-related death counts against expected mortality. The web site “COVID-19 in Proportion?” does this for the UK, stating (as of April 7th) that “COVID-19 will be linked to around 3% of total deaths which number 172,384” for the year 2020. According to the latest cause-of-death data from Statistics Canada that I could find, about 8,500 people died of influenza and pneumonia in 2018, and another 13,000 died of chronic lower respiratory diseases. The total number of all-cause deaths in Canada was 283,706 in 2018, including 106,991 Ontarians. At the time of writing, Canada has 381 “corona deaths”, with 153 of those in Ontario. The fatalities therefore are in the order 0.1% of the expected annual mortality. A number of public health experts quoted by OffGuardian suggest that the impact of COVID-19 is no different from the annual flu. Reporting COVID-19 counts in context with a country’s overall mortality or the death counts of recent influenza cycles could go a long way in reducing the general sense of panic and distress caused by current news reports.

OffGuardian COVID-19 articles,

I admire lawyers for their ability to think through complex societal problems and succinctly outline a written argument. Numerous constitutional lawyers in Germany have now publicly argued that the extent of the COVID-19 response and the process by which it was instated, are out of proportion and therefore illegal. Quotes reported by the Swiss Propaganda Research project include the assessments that the German infectious disease law “cannot serve as a basis for such far-reaching restrictions of citizens‘ rights of freedom” and that “emergency measures do not justify the suspension of civil liberties in favour of an authoritarian and surveillance state”. The most pointed warning comes from a professor of public and ecclesiastical law in the context of the cancelled Easter masses and suggests that our “democratic constitutional state could turn into a fascist-hysterical hygiene state in no time”. At least one German lawyer, Beate Bahner of Heidelberg, is preparing a constitutional challenge of the federal and provincial corona bylaws passed on March 28. Her 18-page explanation (in German) of why the corona bylaws constitute the greatest legal scandal of post-war Germany is compelling.

Petition to improve COVID-19 data for decision-making,

Another lawyer, Viviane Fischer of Berlin, started an open petition with currently 69,000 signatures calling for a baseline study to generate a reliable database for public health decision-making in the coronavirus pandemic. The ongoing COVID-19 data issues noted in the petition include:

  • The inclusion of all corona-positive deceased in the official COVID-19 statistics, irrespective of their cause of death. The vast majority of fatalities had comorbidities and are not tested for other pathogens such as influenza viruses.
  • Tests are mostly limited to patients with COVID-19 symptoms, resulting in an inflated mortality rate. Conversely, untested asymptomatic infections have resulted in an unknown number of people who are now immune to the virus.
  • Duration of infectiousness and mechanics of transmission are yet to be confirmed.

To summarize this post, the COVID-19 crisis presents a learning opportunity for science and social science students regarding the benefits and pitfalls of statistical data analysis and modelling. But unfortunately, hasty data collection and analysis in the context of this pandemic is having serious implications on our livelihood. The issues at hand concern data classification (what is a “corona death”?), data normalization (how to benchmark the death count or confirmed infections?), and data modelling (how to predict a disease when the underlying data are inaccurate, possibly by orders of magnitude?). In Canada, the National Post is the only major newspaper, in which I have so far found two critical articles: “The mystery behind the true COVID-19 death rate” (March 31, reprinted from the Financial Times) and “COVID-19 modelling numbers are scary. Have we mortgaged our future on an inexact science?” (April 8). In addition, an opinion piece in the Hill Times posits that “It’s time to talk about a COVID-19 exit strategy” (April 2). We need more critical journalism and a broader range of perspectives – from health sciences and statistics to social studies, economy, politics, and philosophy – to scrutinize and guide our governments’ COVID-19 response. In other words, calling STEAM* superheroes to the rescue!

*STEAM = the integration of Arts with Science Technology Engineering and Math (STEM)