Googling trends in conservation biology

EL PROULX, PHILIPPE MASSICOTTE, AND MARC P´ Centre de recherche sur les interactions bassins versants—´ emes aquatiques (RIVE) and Groupe de recherche interuni- versitaire en limnologie (GRIL), Universit´ eres, C.P. 500, Trois-Rivi ebec, G9A 5H7, Canada, Abstract: Web-crawling approaches, that is, automated programs data mining the internet to obtain in-
formation about a particular process, have recently been proposed for monitoring early signs of ecosystem
degradation or for establishing crop calendars. However, lack of a clear conceptual and methodological
framework has prevented the development of such approaches within the field of conservation biology.
Our objective was to illustrate how Google Trends, a freely accessible web-crawling engine, can be used to
track changes in timing of biological processes, spatial distribution of invasive species, and level of public
awareness about key conservation issues. Google Trends returns the number of internet searches that were
made for a keyword in a given region of the world over a defined period. Using data retrieved online for
13 countries, we exemplify how Google Trends can be used to study the timing of biological processes, such as
the seasonal recurrence of pollen release or mosquito outbreaks across a latitudinal gradient. We mapped the
spatial extent of results from Google Trends for 5 invasive species in the United States and found geographic
patterns in invasions that are consistent with their coarse-grained distribution at state levels. From 2004
through 2012, Google Trends showed that the level of public interest and awareness about conservation
issues related to ecosystem services, biodiversity, and climate change increased, decreased, and followed both
trends, respectively. Finally, to further the development of research approaches at the interface of conservation
biology, collective knowledge, and environmental management, we developed an algorithm that allows the
rapid retrieval of Google Trends data.

Keywords: biodiversity, ecosystem services, Google Trends, phenology, public awareness, species distribution,
web crawling
Resumen: Los m´etodos de navegaci´on en la red, esto es, programas automatizados de miner´ıa de datos
para obtener informaci´

on de un proceso determinado, han sido propuestos recientemente para monitorear nales tempranas de la degradaci´ on de ecosistemas o para el establecimiento de calendarios de cosecha. Sin embargo, la falta de un marco conceptual y metodol´ ogico ha prevenido el desarrollo de tales m´ etodos en el campo de la biolog´ıa de la conservaci´ on. Nuestro objetivo fue ilustrar como Google Trends, una plataforma de rastreo en la red accesible gratuitamente, puede ser utilizado para seguir los cambios de cronolog´ıa enprocesos biol´ on espacial de especies invasoras y el nivel de conciencia p´ ublica acerca de temas clave de conservaci´ on. Google Trends reporta el n´ umero de b´ usquedas por internet realizadas para una palabra clave en una regi´ on determinada del mundo en un per´ıodo definido. Mediante el uso de datos recuperados para 13 pa´ıses, ejemplificamos como se puede usar Google Trends para estudiar la cronolog´ıade procesos biol´ ogicos, como la recurrencia estacional de liberaci´ on de polen o brotes de mosquitos en un gradiente latitudinal. Mapeamos la extensi´ on espacial de los resultados de Google Trends para cinco especies invasoras en Estados Unidos y encontramos patrones geogr´ aficos de invasiones que son consistentes con su on de grano grueso a nivel estatal. De 2004 a 2012 Google Trends mostr´ o que el nivel de inter´ conciencia del p´ ublico sobre temas de conservaci´ on relacionados con servicios del ecosistema, biodiversidad y cambio clim´ atico incrementaron, disminuyeron y siguieron ambas tendencias, respectivamente. Finalmente, para promover el desarrollo de m´ etodos de investigaci´ on en la interfaz de la biolog´ıa de la conservaci´ conocimiento colectivo y la gesti´ on ambiental, desarrollamos un algoritmo que permite la r´ de datos de Google Trends. Paper submitted January 16, 2013; revised manuscript accepted April 12, 2013. Conservation Biology, Volume 00, No. 00, 1–8  2013 Society for Conservation BiologyDOI: 10.1111/cobi.12131 Trends in Conservation Biology Palabras Clave: Biodiversidad, conciencia p´
ublica, distribuci´ on de especies, fenolog´ıa, rastreo en la red, servi- cios del ecosistema, Tendencias de Google Timing of Biological Processes
With the recent advents of highly distributed mobilenetworks and online social platforms, combined with Phenology is the study of the causes and consequences of the establishment of online search engines, access to advancing or delaying the timing of biological processes, information has never been so extensive and immediate such as plant greening and flowering, pest outbreak, an- (e.g., Barroso et al. 2003; Butler 2006; Aanensen et al.
imal migration, or breeding time. For example, trends 2009). Paradoxically, gathering data on the distribution in the phenology of hundreds of plant species showed and abundance of species at high spatial and temporal res- earlier spring onset in Europe between 1971 and 2000, olution is still a major shortcoming of current ecosystems, advancing at a rate of 2.5 d/decade in response to in- or species, monitoring programs (Morisette et al. 2009; creased air temperature (Menzel et al. 2006). Despite Cleland et al. 2012). Online data streams are increasingly the importance of phenological records for studying the being used by economists (Vosen & Schmidt 2011; Choi effects of climate change on biological processes, cur- & Varian 2012), politicians (Relly et al. 2012), and epi- rent monitoring programs often only actively follow a demiologists (Carneiro & Mylonakis 2009; Ginsberg et al.
limited number of biological processes at a low temporal 2009; Dugas et al. 2012) alike to provide data on market and spatial resolution. With these limitations in mind, and public opinion trends or the spread of human infec- Google Trends may be viewed as a source of up-to-date tious diseases. However, this continuous stream of freely collective knowledge about biological processes. The available data remains underexploited by conservation precision of Google Trend data for assessing the tim- biologists, perhaps because the link between biological ing of biological processes was recently demonstrated processes in nature and data driven by human behavior by Dugas et al. (2012), who reported a high correlation is not as obvious as in other disciplines (e.g., Martin et al.
(Pearson's r = 0.87) between postprocessed Google 2012). We used Google Trends, a freely accessible search Trends data and clinical cases of confirmed influenza in engine, to track changes in the temporal pattern (phenol- adults and no apparent time lags between the 2 sources of ogy) of biological processes and the spatial distribution of invasive species.
To illustrate how Google Trends may be used to track the phenology of biological processes, we queried thesearch terms mosquitoes and pollen for 4 English speak- ing countries: Australia, Canada, England, and the UnitedStates. Moreover, to provide a geographically compre- Google is currently the most-used search engine on the hensive picture, we entered the same search terms trans- World Wide Web; nearly 5 billion queries are submit- lated into the official languages of 9 additional countries ted every day. As a part of the array of Google online (in parentheses): mosquitoes and pollen (Brazil, Mexico, products, Google Trends returns the usage volume of a Spain); dd and dd (China); moustique and pollen particular search term for a specific region of the world uten (Germany); zanzare and over a defined period. Search-term hits are recorded at polline (Italy); d d and d (Japan); and the spatial resolution of individual cities within a region (Thailand). We obtained the timing of biological pro- (e.g., France > Bretagne > Brest) and at the temporal res- cesses associated with keywords pollen and mosquitoes olution of a week. A query in Google Trends first returns in each country by extracting for each year from 2008 a world map of the search-term hits per country and a to 2012, the week associated with the maximum num- monthly time series of the search-term hits dating back ber of search-term hits. Weekly time series of the rel- to 2004. By default, the results returned by Google Trends ative number of search-term hits returned by Google are rescaled by dividing the search-term hits obtained for Trends revealed recurring temporal patterns for pollen a given week by the maximum number of hits obtained and mosquitoes (Fig. 1). The seasonal timing of these bi- at any moment over the period of interest. Query results ological processes at the country level is captured by the are accessed by logging into a Google account and down- broad latitudinal gradient of environmental conditions, loading a csv file of the data. Manually downloading the at least, for temperate countries of Europe and North many files generated by entering separate search-term America (Fig. 2). The temperate countries of Canada, queries is cumbersome. Hence, we have developed an Germany, England, France, United States, and Australia R package that allows for the rapid retrieval of Google display clear cyclical patterns of search-term hits. In con- Trends data (Supporting Information).
trast, such seasonality is difficult to detect in subtropical Conservation BiologyVolume 00, No. 00, 2013 Proulx et al. Week (since 1 January 2008) Figure 1. Weekly time series (2008–2012) of the relative number of search-term hits returned by country afterquerying in Google Trends the keywords pollen and mosquitoes (terms were translated into the official languageof the country). A loess smoothing (span of 0.05) was applied to each time series. or tropical countries such as Brazil, Thailand, and Mex- to search for a remedy on the internet. To further in- ico, which are characterized by a large interannual vari- vestigate this point, we correlated the query results we ability in the timing date of both biological processes obtained for mosquitoes to the search-term hits returned (large error bars in Fig. 2). Finally, the seasonal inver- by querying deet and citronella, the 2 main active ingre- sion of pollen release or mosquitoes outbreak events dients in most commercial insect-repellent lotions. We in southern (e.g., Australia) versus northern hemisphere also correlated the query results returned for pollen with countries (e.g., Untied States) is also manifest (Figs.
those for Zyrtec, Claritin, and Reactin, the 3 main allergy- treating drugs sold in Canada and the United States.
The phenological trends in a country may be ex- Using the data from all weeks since 1 January 2008 plained by the feedback between people's physiologi- (n = 253), we obtained high Pearson's coefficients (r) cal responses to mosquito bites (cutaneous itching) or of 0.90 and 0.91 for the United States and 0.88 and 0.87 pollen exposure (allergic reaction) and their motivation (for Canada) for the mosquitoes and pollen correlations, Conservation BiologyVolume 00, No. 00, 2013 Trends in Conservation Biology national trading, accelerates the dispersion of species around the world (Puth & Post 2005; Crowl et al. 2008;Pysek et al. 2010). Early detection of invasive non-native species is therefore a fundamental component of pre- venting their establishment and spread. Because at their introduction populations are generally composed of few individuals, the initial stage of dispersion is a critical step toward the establishment of a non-native invasive species (Puth & Post 2005; Blackburn et al. 2011). However, eeks since 1 November governmental agencies often lack the resources to detect species introductions at their early stage of dispersion.
Moreover, in cases of established invasive species, a con- siderable amount of public and scientific resources areinvested annually to monitor the distribution of thesepopulations. Galaz et al. (2010) proposed combining web-crawling and expert-knowledge approaches for theearly detection of ecosystem change or degradation. In the context of our study, a first step in that direction is to determine whether web crawlers, such as Google Trends, can be used to map the spatial distribution of invasive species at the country level.
Google Trends spatially disaggregates the volume of returned search-term hits at the level of cities or regions eeks since 1 November within a country. To illustrate this feature and its potential to address the shortfalls of species detection and tracking,we mapped the distribution of search-term hits for 5 in-vasive species in the United States. We entered in Google Trends the following search-term queries: ash borer (Agrilus planipennis), Asian carp (Cyprinus carpio, Latitude (degree) Hypophthalmichthys molitrix, Hypophthalmichthys no-bilis, Mylopharyngodon piceus), fire ants (Solenopsis Figure 2. Latitudinal variation in the annual peak of invicta), Africanized bees (Apis mellifera scutellata), search-term hits returned after querying in Google and pine beetle (Dendroctonus ponderosae). The result- Trends the keywords pollen and mosquitoes for the ing maps show the relative volume of search terms re- period 2008–2012. For each country, the annual peak turned by state for each of the search terms we queried is expressed as the mean date (SD) of search-hit (Fig. 3). The large number of search-term hits in states maximum reported each year starting on 1 bordering the Great Lakes for the emerald ash borer November. Latitude coordinates were obtained online (Haack et al. 2002) and in the upper Mississippi basin from the NationMaster database, with the exception of for the Asian carp (Koel et al. 2000) reflects their re- Canada, which was assigned a value of 45N to reflect spective areas of origin and current dispersion range in the strong southward asymmetry in the repartition of the United States. In the case of Asian carp, the elevated human population centers. number of search-term hits in the northernmost statesalso implies apprehension about the invasion of GreatLakes by these fish. The search-term distribution of fire respectively, a result that suggests a causal relation ants and Africanized bees reflects their introduction and between the seasonal recurrence of pollen release or dispersal patterns; they were first imported to southern mosquito outbreak and people's behavioral responses to United States in the mid-1990s and have since dispersed these 2 processes.
to neighboring states (Woodward & Quinn 2011). Fi-nally, it is well documented that current pine beetleoutbreaks have occurred and continue to spread east- Spatial Distribution of Invasive Species
ward across the Rocky Mountain states such as Montana,Wyoming, and Colorado (Evangelista et al. 2011) (Fig. 3).
Biodiversity and economic losses caused by newly intro- A more thorough validation of Google Trends maps is duced, rapidly spreading non-native species, is an issue of beyond the scope of this essay, but will be needed in growing concern in conservation biology, most notably future applications (see also "Google Trends Limitations" because globalization, and in particular increased inter- Conservation BiologyVolume 00, No. 00, 2013 Proulx et al. Figure 3. Spatial distribution of the relative number of search-term hits returned for each U.S. state of theconterminous United States after querying 5 invasive species keywords in Google Trends: Africanized bees, Ashborer, Asian carp, Fire ants, and Pine beetle. The numbers have been scaled such that the state with the maximumof search-term hits has a value of 100. The white line was drawn from the reference spatial distribution of eachspecies available at the following websites: Africanized bees,; ash borer,; Asian carp,; fire ants,; pine beetle, Public Awareness of Conservation Issues
gressively garnering more public attention as they be-come more important, or is interest waning over time? A central objective of conservation biology is to ensure To illustrate how Google Trends can be used to track that best management practices, or environmental threats changes in the level of public interest and awareness to biodiversity, are efficiently communicated to decision about key conservation issues, we entered the keywords makers and stakeholders (Malcevschi et al. 2012). Pub- climate change, biodiversity, and ecosystem services and lic awareness is often a key component of conservation graphed their temporal trends between 2004 and 2012 agendas because the public may not only be stakeholders (Fig. 4). Recent conservation issues, such as the ecosys- themselves, but they may also have the power to influ- tem services concept, are on an overall increasing search ence decision makers. In this light, 2 international panels trend in English-speaking countries, whereas relatively have been established by the United Nations to improve older conservation issues, such as those related to climate communication among the public, conservation biolo- change, are attracting less attention since 2008 (Fig. 4).
gists, and policy makers: the Intergovernmental Panel on Moreover, the relative search-term volume of biodiversity Climate Change (IPCC) and the Intergovernmental Panel stopped declining after 2010, which incidentally was de- of Biodiversity and Ecosystem Services (IPBES). These clared the international year of biodiversity. Although in- panels provide an interface between scientists and pol- terpreting the human-driven motives behind such broad icy makers in order to better inform the larger commu- temporal trends is far from trivial, the sole existence of nity (e.g., parties involved, stakeholders, and the public) a trend should be seriously considered because Google through the publication of periodic reports. Thus, if pub- Trends data are routinely corrected for the total num- lic interest and awareness is a key ingredient in achieving ber of web queries made over a given week. Hence, conservation goals, it leads to the question: are climate the observed temporal trend in the number of search change-, biodiversity-, and ecosystem-related issues pro- hits cannot be attributed to baseline changes in the total Conservation BiologyVolume 00, No. 00, 2013 Trends in Conservation Biology Ecosystem services Week (since 1 January 2004) Figure 4. Weekly time series (2004–2012) of the relative number of search-term hits returned after querying inGoogle Trends the keywords biodiversity, climate change, and ecosystem services. A loess smoothing (span of 30)was applied to each time series to extract the long-term trends of these conservation issues. number of people searching the web. For instance, al- tered by anglers, researchers studying the topic, or by though the number of persons actively searching the web web surfers looking for a popular Asian carp video.
has substantially increased since 2004, we verified that Third, temporal or spatial patterns may be mistakenly common search terms such as weather and news do not interpreted as being driven by biological processes. For show temporal trends over the 2004–2012 period. We example, the Google Trends search-term volume in the conducted this verification by entering the 2 keywords United States for pollen correlated identically (Person's (weather and news) in Google Trends and observed no r = 0.85) to both plant flowers and pine straw search hits. Although there may be a direct causal associationbetween pollen and flowering plants, the link betweenpollen and pine straw is more tenuous. Although substan- Google Trends Limitations
tial, some of these limitations could be counterbalanced ifkeyword queries are crossvalidated so that they all relate First, online keyword queries in Google Trends within a to the same process (e.g., correlating search hits between country are sent from highly populated cities, which do pollen and plant flowers) or if search trends of irrelevant not form a representative (spatially extensive, random, keywords were removed (e.g., Asian carp youtube). This unbiased) sample of a region. Second, one cannot know is what the search engine Google Flu does. In Google Flu, the real motives behind each internet search recorded by a list of associated search terms are used to estimate sea- Google Trends. For example, we did not know whether sonal trends in the progression of influenza cases (Dugas search-term queries returned for Asian carp were en- et al. 2012).
Conservation BiologyVolume 00, No. 00, 2013 Proulx et al. Table 1. Main advantages of using Google Trends over conventional
communities is impaired if biological processes become field-monitoring programs for tracking changes in the timing of bio-
less synchronized over time, could also be tested more logical processes (T), distribution of invasive species (D), and level of
extensively. Moreover, governmental agencies may have public awareness about conservation issues (P).
interest in monitoring the invasion front of a recently Added value of Google Trends introduced species. What is more, they may want topublish maps showing where the temporal trends of a P, D, T P, D, T species search term is increasing, decreasing, or remains High temporal resolution P, D, T stable and use this information to take actions. Increasing Standardized protocol P, D, T public awareness would also lead to an increased volume Multiple spatial scales (local, P, D, T of internet searches, thus strengthening the use of web- regional, global) crawling approaches for studying conservation issues.
Google Trend Advantages
A thoughtful selection and thorough consideration of We thank all the graduate students and scientists at the keywords apropos of a research question, cultural her- RIVE who enjoy thinking outside the box. We thank I.
itage, and regional differences are therefore the most Seiferling, Y. Paradis, and students from the Geomatics fundamental steps in this analytical approach (Al-Eroud and Landscape Ecology Laboratory (Carleton University) et al. 2011; Al-Kabi et al. 2012). Once the keywords as- for providing engaged comments on the manuscript. R.P., sociated with a particular biological process are defined, P.M., and M.P. participated in the writing of this essay.
temporal and spatial trends for a region can be validated R.P. and M.P. contributed the figures, and P.M. has devel- by the governmental agencies or research laboratories oped the R package for automatically retrieving Google that own the data. For example, pollen release is mea- Trends data. This research was supported by a grant from sured as part of public-health monitoring programs, and The Natural Sciences and Engineering Research Council scientific protocols are routinely established by firms of Canada (NSERC).
specialized in mosquito control. However, in an era ofopen access, validation, testing, and use of web-crawlingapproaches is not limited to data owners. Because the information returned by Google Trends is disaggregatedat the city level (Supporting Information), integrating its Googling Trends in Conservation Biology Using R results with global or regional data sets is a straightfor- (Appendix S1) is available online. The authors are solely ward operation. If one uses cities' geographic coordi- responsible for the content and functionality of these nates as an anchor point, Google Trends results can be materials. Queries (other than absence of the material) matched to georeferenced data sets on, for example, cli- should be directed to the corresponding author.
