The advantages of a doppelgänger city
Pieter Fourie explains how smartphone location data can be analysed without violating the users’ privacy.
Marketing companies own vast amounts of data gleaned from our smartphone apps, which reveal exactly where we’ve been and when. A dataset reviewed by the New York Times shows more than 235 million locations captured from 1.2 million devices in the New York area over a three-day period alone. In a noteworthy article1 and podcast episode2 published in December, the newspaper does an excellent job of shaking us out of our complacency, revealing the tragedy of vulnerable individuals whose privacy gets sold “en masse” to the highest bidder. It raises many issues concerning a lack of policy and oversight in the field of location tracking, and exposes its personal, societal, institutional and corporate dimensions.
The staggering numbers involved in location services data gathering could well represent an Orwellian nightmare in the making. As a mobility researcher in the age of big data, however, one eventually becomes inured to such numbers. Instead of viewing them as the doomsday of privacy, one can focus on the promise they hold for building better mobility models. For it’s indeed possible to use such data while protecting people’s privacy.
Privacy protection arms race
There are several approaches to doing this. In the initial phase of their investigation, the journalists’ queries to data providers were met with claims that data were being aggregated or anonymised. Generally, this means that either data points are bundled together so that individuals cannot be told apart, or that identifying information about them is “masked”, i.e. deliberately altered.
When it comes to data on people's movement, however, anonymization is more tricky3. As the technologies for protecting privacy and anonymising individual trajectories advance, so do the de-anonymising algorithms for reconstructing traces of the individuals. Which means that a responsible data collector might invest in an array of certified devices, only to find that the privacy protection gets defeated sometime later in an unending privacy protection arms race.
Synthetic data as an alternative
This is what motivates our team at the Future Cities Laboratory to develop an alternative to typical location masking techniques. What if we could create synthetic location data streams as what is actually sensed through devices, without compromising resolution in time and space and without reproducing any actual trajectory?
In practice, there are very few circumstances in which someone who wants to analyse mobility data needs access to the detailed original data of a specific person. And it’s also possible to work with a deliberately modified data set. In our research, we focus on building synthetic data streams, using techniques that intentionally restrict the actual raw data to machine-eyes-only.
"It’s possible to create an entire 'doppelg?nger city' to test, probe and experiment with policy decisions, while leaving people in the real world safe and surveillance-free.Pieter Fourie
There are several steps to generating synthetic location data: First raw data from mobile devices are transmitted in a secure and encrypted manner and used to produce audited and certified data aggregates. These can then be deployed to generate synthetic mobility data, which do not differ in their statistical characteristics from the real data. In our lab we’re currently working on two distinct techniques to implement this4,5.
These techniques not only represent advances in privacy preservation; more importantly, they stretch the potential of transport modelling. By feeding this synthetic data into state-of-the-art mobility simulation programmes, it’s possible to create an entire “doppelg?nger city” to test, probe and experiment with policy decisions, while leaving people in the real world safe and surveillance-free.
References
1 external page Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret, York Times, 10 December 2018
2 external page The Business of Selling Your Location, The Daily podcast New York Times, 10 December 2018
3 Chow CY, Mokbel MR: Trajectory privacy in location-based services and data publication, ACM SIGKK Exploration Newsletter 2011, 13: 19, doi: external page 10.1145/2031331.2031335
4 Fourie PJ: Synthesizing high-dimensional, agent-based transport demand data from two-dimensional aggregates with iterative multiple histogram matching, ETH Zurich Research Collection 2016, doi: external page 10.3929/ethz-b-000118466
5 Cuauhtémoc A, Ordo?ez Medina SA: A time-space model of disaggregated urban mobility from aggregated mobile phone data, ETH Zurich Research Collection 2018, doi: external page 10.3929/ethz-b-000268852