Geotagged Tweets: Yesterday, Today and Tomorrow

Paper presented at RESAW19 CONFERENCE: THE WEB THAT WAS: ARCHIVES, TRACES, REFLECTIONS (21 June 2019).

Studies using geotagged data available on Twitter have been very popular in the last ten years. Scholars studying disasters, social movements, and urban phenomena have found them very useful to follow mobility and trajectories and to observe the relationship of people with places and space. Yet, the ways people can geolocate themselves on Twitter as well as the format of data extracted by the researcher has not always been the same. On the one hand, the interface to declare his/her position has evolved. On the other hand, the geographic information that can be collected can vary based on the source of data (API, scraping, etc.) and on the date of the extraction. Focusing on this case study, this paper is meant to show that studies funded on digital native data generated by platforms or apps cannot ignore the histories of such data and of the related device, interface and API. Especially, when the researcher is willing to carry out historical studies by analysing long-term data corpora (ex. Tweets geotagged in Paris between January 2015 and December 2017) or by comparing different moments, distant in time (ex. Tweets geotagged in Paris in January 2014 and in January 2017), he/she also has to carry out a parallel study on the evolution of the digital environment generating such data. This task is surely not banal. If we consider the case of Twitter, the user interface changes at each platform update and the new interface replaces the old one without leaving any trace. Concurrently, APIs for obtaining data evolve by imposing new rules and limitations. As a result, geotagged tweets collected ten years ago don’t have the same structure and origin of the current ones. In particular, they were more numerous and more various. Which is the reason of such a difference? Are Twitter users today less interested in geolocation and less mobile in space?

The best way to answer to these questions is to carry out a history of the platform and of the API through web archives. If it is not possible to retrieve all previous interfaces of the mobile app and of the desktop platform, web archives allow reading old “help” sections of Twitter. Doing so, the researcher can rebuild the main functionalities of the interface over time. Similarly, if it is not possible to test old versions of APIs, the documentation available on web archives allows the researcher to identify main changes in data and metadata. In our case study, such methodology of investigation based on web archives helped us to identify two main changes related to geotagged tweets.

First, as regards the interface, starting in November 2015, Twitter has changed the way a user can declare his/her location. If before such date, once the GPS sensor turns on, the interface provided as default the possibility of communicating an exact location with the precise geographical coordinates (latitude, longitude) of the point where the tweet was sent, today the default choice if the place-name, for example “Paris”, and this name is automatically converted in precise coordinates chosen by the platform, in this case those corresponding to the Hôtel de Ville (the council headquarters of the city of Paris ). As a consequence, if we compare a corpus of geotagged tweets in 2015 and in 2017, it is normal that in the second case, we find an over-representation of tweets in the city centre around Hôtel de Ville.

Second, as regards the APIs for collecting geotagged Twitter data, their use and potential have been restricted year by year. In first versions of free APIs (2009), there was also a Geotagging API that could extract geolocation data, but it was interrupted in 2012. In addition, it should be noted that between 2010 and 2011 Twitter made available the GeoAPI, a very powerful service that not only allowed extracting the coordinates, but also to convert them into place-names. Since April 2011, this tool is used exclusively by the internal services of Twitter. As a result, a researcher who uses the free APIs today can only retrieve geographic coordinates for tweets where the user has decided to declare his/her position while a person who uses paid access will also be able to access tweets’ coordinates for enriched tweets (“Geo Profile”). For example, when a user uses the word “home”, the platform will attribute the location of the user’s profile to the tweet. This technical excursus can explain the difference between analyses that date back to 7-8 years, when the researcher could more easily conduct territorial analyses, and more recent studies (after the restriction of APIs) that have to deal with bigger problems of representativeness and quality of geographic data.

Conference website: https://easychair.org/smart-program/RESAW19/2019-06-20.html#talk:89200