Normalization vs. Standardization – Clarification (?) of Key Geospatial Data Processing Terminology using the Example of Toronto Neighbourhood Wellbeing IndicatorsNovember 30th, 2013
In geospatial data processing, the terms “normalization” and “standardization” are used interchangeably by some researchers, practitioner, and software vendors, while others are adamant about the differences in the underlying concepts.
Krista Heinrich, newly minted Master of Spatial Analysis (MSA) and a GIS Analyst at Esri Canada, wrote her MSA major research paper on the impact of variable normalization and standardization on neighbourhood wellbeing scores in Toronto. More specifically, within a SSHRC-funded research project on multi-criteria decision analysis and place-based policy-making, we examined the use of raw-count vs. normalized variables in the City of Toronto’s “Wellbeing Toronto” online tool. And, we explored options to standardize wellbeing indicators across time. Here is what Krista wrote about these issues in a draft of her paper:
In most analysis situations involving multiple data types, raw data exist in a variety of formats and measures, be it monetary value, percentages, or ordered rankings. This in turn presents a problem of comparability and leads to the requirement of standardization. While Böhringer, & Jochem (2007), emphasize that there is no finite set of rules for the standardization of variables in a composite index, Andrienko & Andrienko (2006) state that the standardization of values is a requirement.
Several standardization techniques exist including linear scale transformations, goal standardization, non-linear scale transformations, interval standardization, distance to reference, above and below the mean, z scores, percentage of annual differences, and cyclical indicators (Dorini et al, 2011; Giovanni, 2008; Nardo et al., 2005; Malczewski, 1999). It should be noted however, that there is inconsistency among scholars as to the use of terms such as normalization and standardization.
While Giovannini (2008) and Nardo et al. (2005) categorize standardization solely as the use of z-scores, they employ the term normalization to suggest the transformation of multiple variables to a single comparable scale. Additionally, Ebert & Welsch (2004) refer to Z score standardization as the definition of standardization and place this method, along with the conversion of data to a 0 to 1 scale, referred to as ‘ranging’, as the two most prominent processes of normalization. According to Ebert & Welsch (2004), “Normalization is in most cases a linear transformation of the crude data, involving the two elementary operations of translation and expansion.” In contrast, other scholars classify the transformation of raw values to a single standardized range, often 0.0-1.0, as standardization (Young et al., 2010A; Malczewski, 1999; Voogd, 1983) while Dailey (2006), in an article for ArcUser Online, refers to the normalization of data in ArcMap as the process of standardizing a numerator against a denominator field. […]
In this paper, we employed the term standardization to define the classification of raw values into a single standardized scale and in particular, through the examination of linear scale transformations and their comparison with Z score standardization. The term normalization is used in this paper to describe the division of variables by either area or population, as is referred to by Dailey (2006), therefore regularizing the effect that the number of individuals or the size of an area may have on the raw count values in an area. “
In other words, the way we use the two terms, and the way we think they should be used in the context of spatial multi-criteria decision analysis and area-based composite indices, standardization refers to making the values of several variables (indicators, criteria) comparable by transforming them to the same range of, e.g., 0-to-1. In contrast, normalization refers to the division of a raw-count variable by a reference variable, to account for different sizes of enumeration areas.
Unfortunately, I have to admit that in my cartography course, following the excellent textbook by Slocum et al. (2009), I am using the term “standardization” for the important concept of accounting for unit sizes. For example, choropleth maps should only be made for standardized (i.e., normalized!) variables, never for raw-count data (a great rationale for which is provided at http://www.gsd.harvard.edu/gis/manual/normalize/). Furthermore, high-scoring blog posts at http://www.dataminingblog.com/standardization-vs-normalization/ and http://www.benetzkorn.com/2011/11/data-normalization-and-standardization/ define normalization as the rescaling to the 0-to-1 range (our definition of standardization) and standardization as the z-score transformation of a variable. Oops, did I promise clarification of these terms ?-)
In case you are wondering about Krista’s results regarding the Wellbeing Toronto tool: It depends! She discusses an example of a variable where normalization changes the spatial patterns dramatically, while in another example, spatial patterns remain very similar between raw-count and normalized variables. Standardization was used to make wellbeing indicators from 2008 comparable to those from 2011, as we will report at the Association of American Geographers (AAG) annual meeting in April 2014. Our abstract (URL to be added when available) was co-authored by Dr. Duncan MacLellan (Ryerson, Politics and Public Admin department), my co-investigator on the above-mentioned research grant, and Kathryn Barber, a student in Ryerson’s PhD in Policy Studies program.