Big Data – Déjà Vu in Geographic Information Science

A couple of years ago, one of my first blog posts here was a brief note on “Trends in GIScience: Big Data”. Although not at the core of my research interests, the discussions and developments around big data continue to influence my work. In an analysis of “The Pathologies of Big Data”, Adam Jacobs notes that “What makes most big data big is repeated observations over time and/or space”. Indeed, Geographic Information Systems (GIS) researchers and professionals have been working with large datasets for decades. During my PhD in the late 1990s, the proceedings of the “Very Large Data Bases” (VLDB) conference series were a relevant resource. I am not sure what distinguishes big data from large data, though I don’t have the space nor time to discuss this further.

Instead, I want to draw a first link between big data and my research on geovisual analytics. In an essay on “The End of Theory”, Chris Anderson famously argued that with sufficiently large data volumes, the “numbers [would] speak for themselves”. As researchers, we know that data are a rather passive species and the most difficult stage in many research projects is to determine the right questions to ask of your data, or to guide the collection of data to begin with. The more elaborate critiques of the big data religion include a recent article by Tim Harford on “Big data: are we making a big mistake?” Harford points to the flawed assumption that n=all in big data collection (not everybody tweets, has a smartphone, or even a credit card!) and argues that we are at risk of repeating statistical mistakes, only at the larger scale of big data. Harford also characterizes some big data as “found data” from the “digital exhaust” of people’s activities, such as Web searches. This makes me worried about the polluted analyses that will be based on such data!

On a more positive note, cartographers have argued for using interactive visualization as a means to analyse complex spatial datasets. For example, Alan MacEachren’s 1994 map use cube defines geovisualization as the expert use of highly interactive maps to discover unknown spatial patterns. On this basis, I understand geovisual analytics as an efficient and effective approach to “making the data speak”. For example, in Rinner & Taranu (2006) we concluded that “an interactive mapping tool is worth a thousand numbers” (p. 647), which may actually underestimate the potential of map-based data exploration. Along similar lines, I noted in Rinner (2007) that data (read: small data) can quickly become complex (read: big data), when they are subject to analytical processing. For example, in a composite index created from a few indicators for the 140 social planning neighbourhoods in the Wellbeing Toronto tool, changes in the indicator set, weights assigned to indicators, and normalization and standardization applied, will create an exponentially growing set of potential indices. The interactive, geovisual nature of the tool will help analysts to draw reasonable conclusions for decision-makers.

A second link exists between big data and my research on the participatory Geoweb. In this research, we examine how the Geoweb is changing interactions between government and citizens. On the one hand, government data are being released in open data catalogues for all to enjoy – i.e., use for scrutinizing public service, developing value-added products or services, or just to play with cool map and app designs. On the other hand, governments start to rely on crowdsourcing to fill gaps in data where shrinking budgets are limiting authoritative data collection and maintenance. In this context of “volunteered geographic information” (VGI), we argue that we need to consider the entire VGI system, including the hardware and software, user-generated data, and the application and people involved, in order to fully understand the emerging phenomenon. We also took up the study of different types of VGI, such as facilitated VGI in contrast to ambient VGI. Of these two types, ambient or “involuntary” VGI is connected with big data and the “digital exhaust” discussed above, as it consists of information collected from large numbers of users without their knowledge.

Again, geographers are in a strong position to examine big data resulting from ambient VGI, as location plays a major role in the VGI system. The 2014 annual meeting of the Association of American Geographers (AAG) included a high-profile panel on big data, their impact on real people, asymmetries in location privacy, and the role of “big money” in big data analytics. In contrast to previous discourse, in which geographers often limited themselves to deploring the disconnect between the social sciences and the developments in computer science and information technology, at AAG 2014 a tendency to more confident commentary and critique of big data and other unreflected IT developments was tangible. We need to understand the societal risks of global data collection and (geo)surveillance, and explain why if you let the data speak for themselves, you may earn a Big Silence or make bad decisions.

Both, my research on Wellbeing Toronto and place-specific policy-making as well as the Geothink partnership studying the Geoweb and government-citizen interactions are funded by the Social Sciences and Humanities Research Council of Canada (SSHRC). While supporting research into the opportunities provided by big data, I think that SSHRC is best positioned among the granting councils to also fund critical research on the risks and side effects of big data.

Reflections on OpenStreetMap

The second Canadian OpenStreetMap (OSM) developer event held at Ryerson’s Geography department started today with a series of presentations and workshops introducing students and members of the broader community to OSM. Toronto OSM guru Richard Weait gave another one of his engaging OSM-or-nothing speeches, telling tales of trap streets and mappy hours. He also got attendants to edit the OSM data and submit a few new features based on their local knowledge of their neighbourhoods or the university campus. Geographic Analysis student, GIS consultant, and spatialanalysis.ca blogger Michael Markieta guided us through the querying of the OSM “planet file” from a PostGIS/PostgreSQL database and its mapping in the open-source Quantum GIS package (see photo).

michael-teaching-osm-queries_08march2013

As most of you will know, OSM is a global volunteer project to create a free geographic base dataset. OSM data have been shown to be more detailed and accurate than commercial data, at least in some areas of the world. There was some interesting discussion this afternoon about potential liability issues due to inconsistencies in OSM data used in professional applications. The concern that OSM contributors could be held liable for erroneous contributions was countered by noting that commercial data vendors provide their data “as is” in just the same way, and that their data are out-of-date most of the time. That certainly seems to be true for my car navigation system! Still, the possibility of downloading OSM data for a professional map at a moment where a misuser has modified or deleted information that has not been detected and reverted by the community makes me uneasy. Also, the thought that detail in OSM, e.g. in rural areas, may depend on whether or not there is an avid mapper living in the area, is unsatisfactory.

Further, the challenges resulting from free tagging of new features were brought up at today’s event. There are support sites such as taginfo.osm.org and the map features list on the OSM wiki, but I cannot help but think that the OSM community is repeating mistakes that were addressed (at least to some degree) by research, development, and best-practice in GIS over the last couple of decades.

Whatever your position with regards to these issues, OSM is playing an increasingly important role in government and business. Our students need to know about it, and I think today’s workshops went a long way to achieve this awareness. Thank you to Mike Morrish and the Student Association of Geographic Analysis (SAGA) for their tremendous support in organizing this educational event and for sponsoring food and drinks today.

From a research perspective, OSM is a fabulous subject too. My interest in it was discussed in a section of an earlier post about volunteered geographic information (VGI) systems. The OSM developer weekend is focusing precisely on hardware, software, and provider/user issues that are not well explained by the VGI label, but captured within our concept of VGI systems to be presented at the 2013 AAG conference.

50 Years of Geographic Information Systems

Some 50 years ago, the Canadian government started the development of a computerized land inventory which would become the prototype of geographic information systems (GIS). Its early history is detailed in a blog post by leading GIS vendor ESRI at http://blogs.esri.com/esri/esri-insider/2012/09/07/the-50th-anniversary-of-gis/.

In addition to the interesting links they provide at the end of their post, I really like the three-part documentary “Data for Decision” on the Canada GIS, which you can access via the GIS and Science blog at http://gisandscience.com/2009/01/25/data-for-decision-42-years-later/, or directly at http://www.youtube.com/watch?v=eAFG6aQTwPk (part 1).

Ryerson’s Department of Geography (formerly School of Applied Geography) has a long tradition of using GIS in research and in the classroom/lab, and thereby training a modern type of geographer and contributing to a new perspective on the study of social and earth systems.

Alumni sightings at Environics Analytics

A group of 4th-year Geographic Analysis students and a few faculty members went to the offices of Environics Analytics today to get a better idea of how “geography works”. Environics is a leading marketing and business intelligence firm, and has been a prime employer of outstanding graduates from our BA in Geographic Analysis and Master of Spatial Analysis programs. This afternoon, a number of graduates from the 1990s and 2000s provided the students with an overview of their careers and current jobs as well as an insight into the most useful knowledge and skills learned in school and on the job. Several speakers emphasized the ability of geographers to keep high-level issues and goals in perspective, and see connections between seemingly unconnected phenomena. Paraphrasing Mrs. Jan Kestle, Environics founder and president, there is nothing in the world that cannot be examined through the geographical lens, which in turn translates into job opportunities for engaged students. Jobs held by our grads at Environics span the sales, research, and software development groups, and include (senior) client advocate, sales consultant, research analyst, research associate, and senior developer. It was rewarding to see how a number of students I taught in the last 6-8 years have found their vocation in a trendsetting yet friendly work environment.

OpenStreetMap developer event

Ryerson’s Department of Geography, Master of Spatial Analysis (MSA) program, and Student Association of Geographic Analysis (SAGA) are hosting the first-ever Canadian, and second-ever North-American meeting of OpenStreetMap (OSM) developers, the Toronto Hack Weekend March 2012. We want our students and the community to be aware of this “Wikipedia for geographic data”, as keynote speaker Richard Weait of the Toronto OSM group put it.

The OSM data were contributed by over half a million volunteers world-wide, and are often more detailed, accurate, or up-to-date than those of commercial competitors such as Google Maps or Bing Maps.

Friday afternoon’s presentation and discussion session raised a number of interesting issues regarding the future development of OSM, including the thematic scope of the data being collected and the mechanics of rendering the comprehensive dataset (“planet file”) into maps (map images, or “map tiles”) of different contents and styles for different purposes. I think Ryerson-trained geographers and spatial analysts will make valuable contributions to OSM in the near future ;-)

A report on how the weekend proceeded can be found on Steve Singer’s Scanning Pages blog. Ryerson Geographic Analysis student Michael Markieta has also posted a summary on his fabulous Spatial Analysis blog.

Scholars GeoPortal

Today was the last meeting of the external advisory committee of Scholars GeoPortal. Scholars GeoPortal was developed by the Ontario Council of University Libraries (OCUL) with funding from the Government of Ontario. The project received the 2012 OLITA Award for Technological Innovation.

The portal officially launched on 1 March 2012. It facilitates access to geospatial data from Statistics Canada, the Ontario Ministry of Natural Resources, DMTI Spatial, and other data providers. Those are data that are heavily used by University students and researchers in geography, planning, civil engineering, and many other disciplines.

It was a privilege to work with data, map, and GIS librarians across Ontario and contribute to the development of the GeoPortal.