… or tell some truths through refined cartography
In his seminal book “How to Lie with Maps”, Professor Mark Monmonier illustrates how map makers can intentionally or inadvertently convey falsehoods using misguided data selection and cartographic design options. In an era of widely accessible, easy-to-use online mapping tools, misleading maps are becoming ubiquitous. Maps of COVID-19 statistics, along with associated graphs and data tables, which have become a focus of public attention this year, are no exception. Therefore, I want to take another look at the pitfalls of the popular choropleth map.
The choropleth map uses geographic areas, e.g. the polygons representing Canada’s provinces and territories, as the map symbol, by shading an entire area with a colour based on an associated data value. We can see this on our public broadcaster’s web site https://newsinteractives.cbc.ca/coronavirustracker/, where the CBC provides an interactive map as part of their “coronavirus tracker”. A red colour scheme is used to shade the provinces in proportion to the total number of COVID-19 cases that were confirmed since the beginning of the pandemic. For example, Quebec’s dark red represents over 100,000 cases while Ontario’s rose colour symbolizes around 75,000 cases.
Embarrassingly, the web site of the Canadian Broadcasting Corporation (CBC), our public-service radio and TV network, is the last major news platform that still has not modified their COVID-19 map to use a suitable map projection. The following figures illustrate the difference in the size and shape of mapped areas between the Web Mercator projection on the left, which is used by the CBC’s mapping tool, and the Lambert Conformal Conic projection on the right. The differences become larger the more north you go. While the southern provinces are reasonably well represented under both projections, the northern territories appear more and more bloated the closer we get towards the North Pole. In fact, the CBC conveniently erased Canada’s Arctic Archipelago with Ellesmere Island from its map (see above) to cut off the most mis-shaped area in the far north!
A second major gaffe in the CBC’s corona map is the use of choropleth symbology for raw-count data such as total COVID-19 cases. We have already reviewed in detail at https://gis.blog.ryerson.ca/2020/03/26/the-graduated-colour-map-a-minefield-for-armchair-cartographers/, why the graduated colour map is more sophisticated than it looks. That is due to the nature of its cartographic symbols being identical to the underlying geographic areas with their different sizes. These sizes can have an undue influence on any statistic collected for each area. For example, we do not know how much of Quebec’s high COVID-19 case count on the CBC map is due to the size of the province (in terms of surface area and/or population) and how much is due to the actual spread of the disease. To overcome this issue, we need to normalize raw-count data by a suitable reference value. If we normalize by area, we arrive at a density variable, e.g. population density as the number of people in each spatial unit divided by its surface area. If we normalize by total population, we obtain a rate, e.g. COVID-19 prevalence as the number of cases within a unit divided by the number of people residing in the unit. Prevalence is often expressed as a rate out of a large number of residents, e.g. X cases per million people, or as a chance, e.g. one case in Y people.
I will use two maps from the web site and data repository OurWorldInData.org to illustrate the need to work with relative metrics. Below, on the left, you see a raw count variable, cumulative COVID-19 cases, mapped by countries as of November 2nd. On the right, the case counts were put in relation to total population by creating a normalized variable, cumulative COVID-19 cases per million people. One of the more obvious differences between the two maps concerns India and Russia. Based on raw case counts, India has clearly more cases than Russia. But based on the relative metric, Russia has more cases per million than India. The “lie” in the raw-count map is based on the fact that it suggests a greater risk of infection in India while arguably the risk is greater in Russia as you are more likely to run into an infected person. (Note that this reasoning is for illustration only, as it relies on the assumption that confirmed “cases” actually have a meaning in terms of infectiousness, which is debatable, and that testing regimes capture a sufficient number of infections, which is almost certainly not the case.)
Believe it or not, at this point there are still three important concerns with choropleth maps that I want to discuss: (1) the misuse of alarming red colour schemes, (2) the misleading portrayal of large areas (provinces, countries) as homogenous, and (3) the arbitrary classification of the data values. All three issues were addressed in a series of articles for the Canadian Geographic magazine written in April by their eminent cartographer Chris Brackley (https://www.canadiangeographic.ca/author/chris-brackley). The issue of “sensationalist colour choices” when mapping the coronavirus was also discussed as early as February 25th by cartography wiz Kenneth Field at https://www.esri.com/arcgis-blog/products/product/mapping/mapping-coronavirus-responsibly/.
The juxtaposition of the world maps above demonstrates the potential impact of colours. Many cultures associate red with threats, risk, vulnerability, and other negative emotions and outcomes. In thematic mapping, we use lightness progression to represent the magnitude of a phenomenon, and typically the darker red a place is depicted the worse the situation. One of the above maps from OurWorldInData.org presents is an example of an ominous, blood-red COVID-19 map. However, their other map of normalized COVID-19 cases per million people uses less alarming blue colours. This graduated colour scheme still has a very dark tone at its high end, and the blue hue is not meaningfully associated with infectious disease (as far as I can tell). Therefore, I am using shades of grey for my map of COVID-19 case rates by province, as grey is certainly the most neutral colour option (and it is printer friendly as an added benefit).
Much of Canada’s population is concentrated in a narrow band near the border with the United States, and the provinces thus have a highly uneven (inhomogeneous) population distribution within their boundaries. Therefore, any population-related phenomenon such as a human infectious disease is improperly mapped if the cartographic symbols suggest that it occurs equally across urban and agricultural areas as well as the vast Canadian wilderness. The same would apply to a city map, where population should not be mapped within major parks or water features. To assist with displaying national-scale data where people actually live, Statistics Canada is offering the “Population Ecumene” dataset documented at https://www150.statcan.gc.ca/n1/pub/92-159-g/92-159-g2016001-eng.htm and shown through the semi-transparent red areas on top of the crowdsourced OpenStreetMap in the overview map below.
Using the inhabited areas as a mask, I can reduce the map symbols of my map to the places where COVID-19 actually occurs with any likelihood. Note that in other instances, where the mapped variable is dependent on the surface area, e.g. when visualizing population density, the values would need to be recalculated to the smaller ecumene areas.
Classification is the final aspect of how to lie with COVID-19 maps that I want to explore today. You can see in the above maps from OurWorldInData.org that the countries’ values are grouped into ranges, e.g. starting with 0-10 cases per million mapped with the lightest blue, followed by 10-50 cases p.m. with the next-lightest shade, and so on. The map-maker chose “nice” round class breaks, but hidden behind these is a pattern of exponentially increasing intervals. For example, the range of values grouped into the fifth class (500-1000) is ten times the range of values grouped into the third class (50-100). Their map of raw case counts has an even more abrupt increase in the last two classes, as shown in the red line of the following graph (note that the two lines each have their own y-axis).
My previous map above also uses a classification that progresses faster than linear. This is not necessarily “wrong” but we need to be aware that data classification occurs and that it can be used to influence the message of a map. At this point, we should credit CBC for one aspect of its COVID-19 map: they avoid classification issues by using an unclassed choropleth map. In the CBC map reproduced at the beginning of this post, note how the colour for each province is picked from a continuous, linear progression of shades from light to dark (red).
My final map version employs the same unclassed approach using grey shades. Note that the legend symbols now do not represent class breaks but are just sample colours taken from the linear progression from light (white) to dark (black). In addition, I set the maximum value not to the largest value in the dataset but to a meaningful benchmark, the value of 30,000 COVID-19 cases per million that the United States are currently approaching. Of course, even this “large” value represents only 3% of the population. The subdued map appearance hopefully conveys the still limited scope of the Sars-CoV-2 “pandemic”. Now who would have known that shades of grey could be this sexy?