What information does our geographical position convey about us? What information about myself and others I am giving (voluntarily or not) to those that have access to my location history? This is a question that I often ask myself when I try to understand and argue why it is so important for companies like Google to keep a track of people’s position at all times. The obvious reason is money: they want to monetize that vast amount of information. But how? What is the information that they can obtain and how do they make business out of it?
The reality nowadays is that most of us ignore all the details regarding what information is being juiced out of our personal location data and how it is being monetized. What is clear, though, is that, in terms of information, the aim is to create a profile of each one of us and a model that can be used to predict our preferences and behavior individually, and that location information is only one source of data that they use to create and exploit this model. However, with the advancements in localization and mapping technologies, soon it will be possible to estimate the position of mobile devices outdoors and indoors with an accuracy of less than a meter, and through this our personal location. This unprecedented ubiquitous level of accuracy will make it possible to directly track down where and how we spend time in close detail. This will expose more private information than ever before about our daily habits, activities and about how we live our lives in general. Furthermore, when cross-analyzing this type of location data collected from larger population groups, one obtains access to information regarding how we relate to others, perhaps the most important factor that defines our personality. Therefore, position information is becoming a key input explanatory variable in that complex artificial model of our personality, and thus its value is increasing in the eyes of companies creating and exploiting peoples’ profiles
But what personal information could one obtain from accurate location data records? The answer is that it depends on the creativity and skills of the experts that design the algorithms, which in turn work following for-self-profit corporate business strategies. There are no implicit limits as to what information can be distilled from raw data, but a general rule of thumb: the more detailed the data source is, the more information it carries and the more accurate models and predictors can be built. As a scientist familiarized with simple statistical, machine learning and artificial intelligence methods, it is not difficult for me to come up with possible probabilistic models and algorithms that one could devise and implement in a computer in order to infer personal information from people based on their geographical position history. Here I collect a few thoughts from one simple and informal brainstorming session.
Our spatiotemporal patterns reveal who we are
Let’s assume that we have an uniquely identifiable device (e.g. a mobile phone) with us all the time whose geographic position is estimated periodically (e.g. once per second) with the above-mentioned accuracy of 1 m and stored together with a timestamp in a database. This is what I will refer as our location history record. Let’s now think about our daily life in terms of our geographic position and spatiotemporal behavior, all embodied in this collected data. It is clear that we all have repetitive patterns that are easy to identify, and these carry rich personal information about us. Let’s see some examples.
One clear repetitive behavioral pattern is that most of us work in the same place during roughly the same time interval, and that every evening/night we come back home, where we stay for a while and eventually spend the night and the early mornings. These patterns make our home and working places easy to recognize from location history records: for example, one can simply plot a dot on a map for each of the points of your location history and look for the center of the two areas with highest density of points (see Illustration 1). Of course, you can also refine the identification by making two separate plots: one with points corresponding to positions during the evening/night and another one during roughly working hours.
In fact, speaking about sleeping, we typically do it in the same bed and bedroom, perhaps with the same person which, very likely, is our partner. Some might also have kids sharing the same common space (home) but sleeping in other sub spaces (rooms). Broadly speaking, we can say that, given the structure of our society, it is very likely that two or more persons typically spending the nights in the same space form a couple/family. There are other possibilities, like for example flat-mates in a student dorm, but these tend to be localized in known places and have different mid-term spatial behavioral patters through which they can be easily distinguished.
This type of general “broad” information known beforehand is what statisticians call “prior knowledge”. Using the previously described prior knowledge and individuals’ accurate position history records, it would be relatively easy to infer things like who sleeps with who and who belongs to what family group. For example, one possible way to do it in practice is to form small-sized spatial clusters (about a room large) containing the location of different persons during the night. One can argue that most of the clusters containing two elements correspond to people sleeping together or in the same room. Increasing the cluster size to e.g. the average size of flats in a certain neighborhood, one could identify with a relatively high success rate the whole set of family groups in that neighborhood and what family each individual belongs to (see Illustration 2)
Let’s still keep our attention in our homes for a while and think about how we use its different subspaces (rooms). We wake up and typically have breakfast with the same people (probably family members) and in the same space; let me guess: either the kitchen or the living room. Furthermore, it might be the same place where the family has dinner together (again, prior knowledge). Thus, one can identify a common space within the house and associate it with a certain usage/activity (preparing/eating food). Similarly, the place where people watch TV, play games or spend free time can equally be identified and labeled as “livingroom”. After all, people usually watch TV in a sofa, not moving much and “in clusters” (with other group members). If in addition you have a smart TV, be sure that its position can be estimated using WiFi signals, and, if you are watching it, you are likely to be close to it.
The main point is that, by analyzing spatiotemporal patters of groups of people from position history records, one can create a map of all the spaces in which they interact (e.g. buildings) with abstract subdivisions to which one could put attributes, such as the activities that the group members develop there. And once these maps are created, one can do the inverse process: infer information about a person (e.g. activity or interests) knowing his position in that space. There are two relevant aspects regarding these maps that are worth considering. First, they do not need to be the typical visually appealing floormaps designed to show spatial information to humans: they only need to be understood by the computer that runs the inference algorithms and for the purpose they have been designed. Second, it is not necessary to send specialized surveying personnel to build the map on-site: it is built automatically by the group members that share the space (the family), probably being completely unaware of what is going on.
A close-up in public spaces
The same rational can be applied to other less private spaces. If the space is completely public, the prior knowledge on what activity is normally carried out in that particular space can be more accurate and retrieved manually beforehand. For example cafeterias, restaurants, cinemas, gyms, libraries, theaters and museums, shopping malls (and shops inside them) all are spaces in which we develop well defined and different activities. Imagine that you are in a museum. A typical floormap of the museum would be divided in rooms, each room containing different paintings. Another possible, more detailed map to be stored in a computer could consist of associations between paintings and the areas where the visitors are most likely to be looking at them. Once this map is created, it would be easy to know what paintings the visitors are more interested in from their timestamped position records.
The same principles can be applied e.g. to a supermarket. In this case the map can consist of associations between stands and the spaces in front of them where people are looking at the products exposed. So, from our position records, one could infer information about how we move in the supermarket, what calls our attention, what products we are interested in, how long do we spend deciding whether to buy something or not, etc. all at an individual level. This is actually the target of retail analytics, a similar concept than web analytics but applied to traditional physical shops. Its aim is to propose and study the effectiveness of different selling strategies, like position of adds, stands and so on, including personalized marketing. At the moment, tracking shoppers’ movements is done e.g. using devices attached to the shopping charts and trolleys as well as smart phones, but there is research done about how to track the eyes of the consumer while shopping.
Our mode of transportation can also be inferred from location information. If you go to work by car, you will use roads starting and finishing the trip from/at different positions around the same area with certain, but small, degree of randomness. If you take the bus you enter and leave the line only in bus stops located in well known fixed spots, plus the buses always follow a well known route. If you use the bike you cannot go as fast as cars, but faster than pedestrians. And if you go walking… well, you are lucky!, but still traceable. Again, these are all activities (in this case transportation modes) characterized by different spatiotemporal patterns that can be easily identified. And, again, our transportation means convey information about our social status and values, at least to some extent, don’t they?
Mutual location history reveals relationships
One important idea that has been floating around but has not been explicitly mentioned in this blog is that the type of relation that we have with other persons can be inferred from our mutual location history. Without much effort, one could come up with a simple classification criteria for types of human relations based on this history. For example, we can form the following classification groups:
- Colleagues: persons that we share the working time with.
- Acquaintances: persons that we see during leisure time only every now and then and mostly outside our place of residence.
- Friends: persons that we see and spend time with more often, occasionally also in our respective places of residence. We might go to public spaces together (e.g. cinemas, parks, gym, etc.).
- Relatives/flat mates: persons spending time regularly in the same space during the night.
The information about how we relate to others is especially revealing because we influence and are influenced by others in different ways and to a different degree depending on what type of relation we maintain. For example, we tend to keep closer relations with persons with whom we share something, e.g. interests/hobbies or values. This, in turn, means that my own profile can be used to add info to the profile of others that relate to me, even if they have not given the consent to collect data from them (see shadow profiles). This creates a fundamental problem in privacy management: my personal attitude and choices affects the privacy of others. It is like smoking: if I smoke, others around me will smell like smoke, whether they choose to smoke or not. This raises an interesting issue regarding our own responsibility towards the privacy of others, especially of those closest to us: privacy is not a personal individual choice and election anymore.
In conclusion: the location history of a population is a very rich source of information that contains very detailed private information about the activities, habits and relations of each of its members, from which more elaborated conclusions can be made regarding their status, religion, interests, health, personal attitudes/values and other characteristics of their individual and collective personality. A record of our position history tells who we and others around us are.