How to bulk download data from Google Trends with R

July 29, 2019 in R Programming

Today I want to share with you an R script I created a couple of weeks ago. It’s very simple and it basically helps you to bulk download data from Google Trends into a .csv file. So far, I used it to download 500-1000 keywords per download but you can increase the limit by using a proxy.

How does it work?

This script is very simple but at the same time very powerful. In a couple of minutes, you’ll be able to download thousands of keywords data from Google Trends (in the interface you can only download 5 keywords at once!).

First of all, I want to thank Philippe Massicotte who developed the amazing package gtrendsR. His package is doing most of the dirty work, I just wrote some lines of code in order to run gtrends multiple keyword at the same time. You can split my script in three parts:

  1. load the keywords list via .csv file
  2. create the googleTrendsData function which wraps gtrends and adjust the output
  3. use map_dfr to run the function over the keywords list and export the results

Now it’s your turn! Below you can find the script with all the essential comments. You can simply copy-paste the script into R Studio and adjust the different parameters (e.g. geographic region, time span, category, Google product,…).

You should find in your working directory a file called download.csv. Below you can find an export example:

Write a Comment

Comment

62 Comments

  1. I’m facing the following error (even when trying to repeat exactly your code):

    Error in [<-.data.frame(*tmp*, , timevar, value = "subject") :
    replacement has 1 row, data has 0

    • Hi Lucas,

      this error occurs when you use the category argument. The package doesn’t work for some combination of keywords-category id.
      I sent a message to the developer, hopefully he will fix it! 🙂

  2. Hi, I am trying to download daily data from the beginning of Jan 2020 until Aug 25 2020. But I am facing the following error
    “Error: Column hits can’t be converted from character to integer.” Could you please help, Ruben? Thank you so much.

        • Hi Hanh,

          can you please send me the email to info [at] rubenvezzoli.it? I would need the code plus the .csv file you use as input.

          • Hi Ruben. I have just shot you an email. Please help check whether you have got it. Thank you so much!

          • Hi Ruben, I think I face the same problem. My assumption is there are hits with “<1" and other hits with an integer value. How to solve this?
            Thank you 😀

          • Hi Calvin,

            this code should solve your issue. I basically change the date type of the output from numeric to character. Of course, this solution works if you’ll also use the data outside of R. If you want to do some calculations after the download, you’ll have to delete the <1 rows (or replace <1 with 0) and change again the data type to numeric.

      • Hi Ruben,

        I hope you have been fine.

        The revised code worked well back then in September but today I was running the code again but received this error message : Error in map_dfr(.x = kwlist1, .f = googleTrendsData) :
        could not find function “map_dfr” . Could you please let me know what I should do, Ruben? Thank you so much .

  3. Hi Ruben,

    thanks for this. Is there any restriction regarding the number of queries inside the keyword list? And last but not least, how do I handle special charakters inside the queries so that R is able to read it correct?

    Thank you!
    Thomas

    • Hi Thomas,

      my advice is to have a maximum of 1000 keywords in the list, simply because Google will block you if they realize that you are using Google Trends via an unofficial API provider like the gtrendsR package. It depends on the character, can you show me some examples?

  4. Hello Ruben,

    Thank you for the code! Is it possible to request SVIs specific to all 51 US states without having to manually specify each state in “geo”. The max it will let me do is 5 states. Plus, it blocks me after every couple of state for 24 hours or more. You provide the keywords as list in the code. Is it possible to do the same for the states?

    Thank you so much!
    -Anna

    • Hi Anna,

      the issue is that you’re sending too many requests at the same time and Google blocks you. I would add Sys.sleep(5) (the command pause the script for 5 seconds after each loop) inside the function and run the script multiple times.

      Best!

    • Hi Anna,

      I tried to tweak Ruben’s code, but could not figure it out. I was successful with these nested for loops:

      Country = c(“US-CA”,”US-LA”,”US-ME”,”US-NY”,”US-CO”,”US-WY”)
      Keyword = c(“tupac”,”biggy”,”jay-z”,”nas”,”eminem”,”andre3000″)
      results <- list()

      for (i in Keyword)
      {
      for (j in Country) {
      time=("today 3-m")
      channel='web'
      trends = gtrends(keyword=i, gprop =channel,geo=j, time = time)
      Sys.sleep(5)
      results [[j]][[i]] <- trends$interest_over_time
      }
      }

  5. Hi Ruben,

    thanks for this, I really enjoy your code! I was wondering whether a similar code could be applied in multiperiod queries for daily data for a single keyword. In the interface is only allowed to download 5 (90-days) periods at once for daily data.

    btw the last gtrendsR version does not allow me to download 5 daily periods at once but I suppose this is something that is going to be fixed soon. For example:
    data<-gtrends("shoes", geo = c("US"), time = c("2004-01-01 2004-03-31","2004-04-01 2004-06-30","2004-07-01 2004-09-30","2004-10-01 2004-12-31","2005-01-01 2005-03-31"))

    • Hi George,

      first of all thanks for your comment! 🙂 Regarding your question, yes you could adjust my function and loop the function over a date range instead of a keywords list. I would also add a command like Sys.sleep(3) because you are going to do many API calls and Google could block you. It shouldn’t be that hard if you know a bit of R.

      Ruben

  6. Hi Ruben,

    Thank you very much for the code it is very helpful.
    Is it also possible to adjust the function that it loops over both a list with date ranges and a keyword list?
    My problem is that I need the results for my keywords for different weeks and I have not really an idea of how I can accomplish this.
    Many thanks in advance!

    Best regards,
    Moritz

  7. Hi Ruben
    Congrats on the code!
    I noticed that each key word (in the Keyword list or CSV file) is treated independently, i.e. not relative to other keywords’ popularity. As such, every key does get a score of 100.

    In the google online tool, one can compare up to 5 items, and an index of 100 would get assigned to the keyword with most volume over the time period of interest. The remaining 4 would be indexed off that. If one wants to compare more than 5 KWs then the largest query volume must be used everytime so the normalization is consistent. I was wondering if your script could do this with some modifications? thanks !!! great job

    • Hi Fsarrazit,

      I wouldn’t use Google Trends to compare search volumes. Instead, I would use the Google Keyword Planner or any other SEO tool like AHrefs or SEMRush. Use Google Trends only to detect seasonalities or anomalies, also because it’s not possible to compare more than 5 keywords at the same time.

      • True, but many do it using a ‘trick’, i.e. using the one keyword with the highest index (100) over the time period entered, and maintaining it as a constant. see below

        KW 1 –> index 100
        KW2
        KW3
        KW4
        KW5

        then you hit the limit…..to compare more words :

        KW1 (again)
        KW6
        KW7
        KW8
        KW9

        in a loop….hence why i thought it would be possible to tweak your script.

  8. Hi Ruben,

    Thanks, this is very helpful! Silly question, but can you describe or screenshot how you arranged your keyword csv file? I’m having trouble loading mine.

    • Hi Lee,

      can you please send me an example of the error message you get? Usually I open Excel and then I import the csv file (of course you have to define the delimiter and the encoding type. Here in Europe is UTF-8)

    • Hi Antonio,

      hits it’s the column where you can see the interest for a keyword in a specific period.
      If you check the Google Trends website and you enter a keyword, you’ll see the values of “hits” in the main chart (that value goes from 0 to 100)

  9. Hi Ruben! Thanks for this script! Just a question, what is the code for a global research? I’m interested in the trends of a pool of keywords worldwide and not by single contries. Thanks 🙂

    • You can’t do that. Don’t forget that people use different words to describe the same thing (think about the differences between american-english and british-english). My advice is to select and download the best keyword for each country.

  10. Dear Ruben,

    Thank you for sharing this code! This will be super useful once I figure out how it works. I was hoping you could assist me with this. My apologies, for the silly mistakes that I must have made but I have no experience with coding at all. Please find below the code that I input into the RStudio along with the errors that it returns.

    > installed.packages(“readr”,”gtrendsR”, “purrr”)
    Error in if (noCache) { : argument is not interpretable as logical
    > library(readr)
    > library(gtrendsR)
    > library(purrr)
    >
    > kwlist <- readLines(“~/Desktop/Dissertation readings /Data/500words.csv”)
    Error: unexpected input in "kwlist
    >
    > googleTrendsData <- function (keywords) {
    +
    +
    + country <- c('IT')
    + time <- ("2018-08-01 2018-08-27")
    + channel <- 'web'
    +
    + trends <- gtrends(keywords,
    + gprop = channel,
    + geo = country,
    + time = time )
    +
    + results
    > output
    > write.csv(output, “download.csv”)
    Error in is.data.frame(x) : object ‘output’ not found

    I would greatly appreciate it if you could help me understand the problem!

    • Hi Vitalii, unfortunately, I wasn’t able to replicate the errors you get. Did you adjust the kwlist <- readLines(“~/Desktop/Dissertation readings /Data/500words.csv”) command with the path of your keywords list?

  11. Hi Ruben,
    I was trying to run the program and some errors happened. The error message is
    “Error in curl::curl_fetch_memory(cookie_url, handle = cookie_handler) :
    Timeout was reached: [trends.google.com] Connection timed out after 10001 milliseconds”
    I don’t know what happened. Could you help me with it?

  12. Dear Ruben,

    Thank you so much for this blog post – it is a potential live saver for my research.
    I could successfully follow your steps and generate an output.
    I am trying to repeat the process with a keyword list in Japanese. I have run the manual, single word command before with Japanese words, so I know it should work. In addition, when I load the csv file and specify the UTF-8 encoding, I can see the correct characters in the ‘Values’ section in Rstudio. Yet when running the ‘output’ command, I receive this error:

    Error in interest_over_time(widget, comparison_item, tz) :
    Status code was not 200. Returned status code:400

    When I load a csv file with English words and keep all other parameters the same, it runs smoothly.
    If you would have some spare time, could you perhaps support on this?

    Thanks you in advance,
    Kind regards,

    • Hi Bert,

      unfortunately I couldn’t replicate the error message you get. I tried to run the script with the word ポケモン and it worked.
      Did you update the country argument with “JP”?

      country <- c('JP') time <- ("2018-08-01 2018-08-27") channel <- 'web' trends <- gtrends("ポケモン", gprop = channel, geo = country, time = time )

  13. Error in FUN(X[[i]], …) :
    Status code was not 200. Returned status code:500
    After I tried a list of 500 but for next five hundred , it gives this error

    • It seems that you sent too many requests to Google Trends and they blocked you. Just wait a couple of hours and the code should work again.

  14. Thank you for sharing this. Do you know how can I adjust the command for downloading more than 5 geographic locations at the same time? I tried to read the list of geo from a CSV file but I keep getting error.

  15. Hi Ruben, thanks for the code! I keep getting this error and was wondering if you could help me with it.

    Error in gtrends(keywords, gprop = channel, geo = country, time = time) :
    length(keyword) <= 5 is not TRUE

  16. Hi Ruben,
    Thanks for the code.
    Is there a way to get daily trends and not hits?
    I would like to find out how many times was a keyword was searched on a daily basis.

    Great job!

    • Than I think you are using the wrong tool. If you need searches you shouldn’t use Google Trends but something like Google Keyword Planner, Semrush or Ahrefs (but none of them will give you daily searches but only monthly searches).

    • Yes, it is for sure possible. Unfortunately, I don’t use Python so I don’t know the name of a good package/library 🙁

  17. Hey! thank you so much for making this available. I have used it for a while but now it seems to not be working anymore. Do you know why? The error I get is this:
    “Error: No data returned by the query. Consider changing search parameters.”
    Thank you

    • Hi Janaina, maybe your query it’s very specific. Did you try to get the data for a popular search term?

  18. Hello!

    I am trying to download interest by country data and get the following error “widget$status_code == 200 is not TRUE”. Do you know what might be the issue?

    Thank you!

    • Yes, if you send too many requests at the same time Google Trends will block you. You should try to send less requests or use a VPN proxy.