Today I want to share with you an R script I created a couple of weeks ago. It’s very simple and it basically helps you to bulk download data from Google Trends into a .csv file. So far, I used it to download 500-1000 keywords per download but you can increase the limit by using a proxy.
How does it work?
This script is very simple but at the same time very powerful. In a couple of minutes, you’ll be able to download thousands of keywords data from Google Trends (in the interface you can only download 5 keywords at once!).
First of all, I want to thank Philippe Massicotte who developed the amazing package gtrendsR. His package is doing most of the dirty work, I just wrote some lines of code in order to run gtrends multiple keyword at the same time. You can split my script in three parts:
- load the keywords list via .csv file
- create the googleTrendsData function which wraps gtrends and adjust the output
- use map_dfr to run the function over the keywords list and export the results
Now it’s your turn! Below you can find the script with all the essential comments. You can simply copy-paste the script into R Studio and adjust the different parameters (e.g. geographic region, time span, category, Google product,…).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# Last Update: 2020-08-22 # Install and load the readr gtrendsR & purrr packages installed.packages("readr","gtrendsR", "purrr") library(readr) library(gtrendsR) library(purrr) # Load your keywords list (.csv file) kwlist <- readLines("Your-keywords-list-path.csv") # The function wrap all the arguments of the gtrendR::trends function and return only the interest_over_time (you can change that) googleTrendsData <- function (keywords) { # Set the geographic region, time span, Google product,... # for more information read the official documentation https://cran.r-project.org/web/packages/gtrendsR/gtrendsR.pdf country <- c('IT') time <- ("2018-08-01 2018-08-27") channel <- 'web' trends <- gtrends(keywords, gprop = channel, geo = country, time = time ) results <- trends$interest_over_time } # googleTrendsData function is executed over the kwlist output <- map_dfr(.x = kwlist, .f = googleTrendsData ) # Download the dataframe "output" as a .csv file write.csv(output, "download.csv") |
You should find in your working directory a file called download.csv. Below you can find an export example:
Can you suggest how to use a proxy? I have a requirement to scan over 5000 key words twice a day.
Hi,
why would you scan the keywords twice a day? Anyway, I think online there are many proxy services and probably also some R packages. My advice is to check their documentation to see how to connect your R IDE to the proxy.
Hi Ruben, thank you so much for sharing this code, this is a really powerful data if it could be use properly. i’ll just get to the point. do we have to add more code ourselves to use a proxy feature? and is there a source to learn how to add porxy feature?
Hi Dananjaya, yes you have to add an extra piece of code if you want to use a proxy service. You can find more information about it here: https://support.rstudio.com/hc/en-us/articles/200488488-Configuring-R-to-Use-an-HTTP-Proxy
I’m facing the following error (even when trying to repeat exactly your code):
Error in
[<-.data.frame
(*tmp*
, , timevar, value = "subject") :replacement has 1 row, data has 0
Hi Lucas,
this error occurs when you use the category argument. The package doesn’t work for some combination of keywords-category id.
I sent a message to the developer, hopefully he will fix it! 🙂
Ruben,
It seems that, when the number of observations from Google Trends is higher than 1000, there will be issues.
Not sure why.
Hi, I am trying to download daily data from the beginning of Jan 2020 until Aug 25 2020. But I am facing the following error
“Error: Column
hits
can’t be converted from character to integer.” Could you please help, Ruben? Thank you so much.Yes, of course! Could please send me your code? Because I want to replicate the error.
Hi, I have sent through tab “CONTACT”. Have you received it? Thank you.
Hi Hanh,
can you please send me the email to info [at] rubenvezzoli.it? I would need the code plus the .csv file you use as input.
Hi Ruben. I have just shot you an email. Please help check whether you have got it. Thank you so much!
Hi Ruben, I think I face the same problem. My assumption is there are hits with “<1" and other hits with an integer value. How to solve this?
Thank you 😀
Hi Calvin,
this code should solve your issue. I basically change the date type of the output from numeric to character. Of course, this solution works if you’ll also use the data outside of R. If you want to do some calculations after the download, you’ll have to delete the <1 rows (or replace <1 with 0) and change again the data type to numeric.
Thank Ruben a lot. I worked perfectly now! My kudos to you for your excellent support!
You’re welcome!
Hi Ruben,
I hope you have been fine.
The revised code worked well back then in September but today I was running the code again but received this error message : Error in map_dfr(.x = kwlist1, .f = googleTrendsData) :
could not find function “map_dfr” . Could you please let me know what I should do, Ruben? Thank you so much .
Hi Ruben, it worked now. Thank you!
Hi Ruben, do you know hw can I download the data from the ‘Year’ section?
I leave you an example, https://trends.google.com.ar/trends/yis/2019/AR/
I tried:
url<-'https://trends.google.com.ar/trends/yis/2019/AR/'
top10<-html_nodes(read_html(url), 'table')[1]
But I cannot make it. Thanks!
Hi Juan,
it’s not possible to do that via gtrendsR. You should try to scrape the page, maybe using the package rvest.
Hi Ruben,
thanks for this. Is there any restriction regarding the number of queries inside the keyword list? And last but not least, how do I handle special charakters inside the queries so that R is able to read it correct?
Thank you!
Thomas
Hi Thomas,
my advice is to have a maximum of 1000 keywords in the list, simply because Google will block you if they realize that you are using Google Trends via an unofficial API provider like the gtrendsR package. It depends on the character, can you show me some examples?
Hello Ruben,
Thank you for the code! Is it possible to request SVIs specific to all 51 US states without having to manually specify each state in “geo”. The max it will let me do is 5 states. Plus, it blocks me after every couple of state for 24 hours or more. You provide the keywords as list in the code. Is it possible to do the same for the states?
Thank you so much!
-Anna
Hi Anna,
the issue is that you’re sending too many requests at the same time and Google blocks you. I would add Sys.sleep(5) (the command pause the script for 5 seconds after each loop) inside the function and run the script multiple times.
Best!
Hi Anna,
I tried to tweak Ruben’s code, but could not figure it out. I was successful with these nested for loops:
Country = c(“US-CA”,”US-LA”,”US-ME”,”US-NY”,”US-CO”,”US-WY”)
Keyword = c(“tupac”,”biggy”,”jay-z”,”nas”,”eminem”,”andre3000″)
results <- list()
for (i in Keyword)
{
for (j in Country) {
time=("today 3-m")
channel='web'
trends = gtrends(keyword=i, gprop =channel,geo=j, time = time)
Sys.sleep(5)
results [[j]][[i]] <- trends$interest_over_time
}
}
Hi Ruben,
thanks for this, I really enjoy your code! I was wondering whether a similar code could be applied in multiperiod queries for daily data for a single keyword. In the interface is only allowed to download 5 (90-days) periods at once for daily data.
btw the last gtrendsR version does not allow me to download 5 daily periods at once but I suppose this is something that is going to be fixed soon. For example:
data<-gtrends("shoes", geo = c("US"), time = c("2004-01-01 2004-03-31","2004-04-01 2004-06-30","2004-07-01 2004-09-30","2004-10-01 2004-12-31","2005-01-01 2005-03-31"))
Hi George,
first of all thanks for your comment! 🙂 Regarding your question, yes you could adjust my function and loop the function over a date range instead of a keywords list. I would also add a command like Sys.sleep(3) because you are going to do many API calls and Google could block you. It shouldn’t be that hard if you know a bit of R.
Ruben
Hi Ruben,
Thank you very much for the code it is very helpful.
Is it also possible to adjust the function that it loops over both a list with date ranges and a keyword list?
My problem is that I need the results for my keywords for different weeks and I have not really an idea of how I can accomplish this.
Many thanks in advance!
Best regards,
Moritz
Hi Moritz,
I sent you an email 😉
Hi Ruben. I had a similar query. Kindly help me out.
Hi Ruben
Congrats on the code!
I noticed that each key word (in the Keyword list or CSV file) is treated independently, i.e. not relative to other keywords’ popularity. As such, every key does get a score of 100.
In the google online tool, one can compare up to 5 items, and an index of 100 would get assigned to the keyword with most volume over the time period of interest. The remaining 4 would be indexed off that. If one wants to compare more than 5 KWs then the largest query volume must be used everytime so the normalization is consistent. I was wondering if your script could do this with some modifications? thanks !!! great job
Hi Fsarrazit,
I wouldn’t use Google Trends to compare search volumes. Instead, I would use the Google Keyword Planner or any other SEO tool like AHrefs or SEMRush. Use Google Trends only to detect seasonalities or anomalies, also because it’s not possible to compare more than 5 keywords at the same time.
True, but many do it using a ‘trick’, i.e. using the one keyword with the highest index (100) over the time period entered, and maintaining it as a constant. see below
KW 1 –> index 100
KW2
KW3
KW4
KW5
then you hit the limit…..to compare more words :
KW1 (again)
KW6
KW7
KW8
KW9
in a loop….hence why i thought it would be possible to tweak your script.
Hi Ruben,
Thanks, this is very helpful! Silly question, but can you describe or screenshot how you arranged your keyword csv file? I’m having trouble loading mine.
Hi Lee,
can you please send me an example of the error message you get? Usually I open Excel and then I import the csv file (of course you have to define the delimiter and the encoding type. Here in Europe is UTF-8)
Hello Ruben congratulations for the job. Please what do you mean by hits?
Hi Antonio,
hits it’s the column where you can see the interest for a keyword in a specific period.
If you check the Google Trends website and you enter a keyword, you’ll see the values of “hits” in the main chart (that value goes from 0 to 100)
Hi Ruben! Thanks for this script! Just a question, what is the code for a global research? I’m interested in the trends of a pool of keywords worldwide and not by single contries. Thanks 🙂
You can’t do that. Don’t forget that people use different words to describe the same thing (think about the differences between american-english and british-english). My advice is to select and download the best keyword for each country.
Dear Ruben,
Thank you for sharing this code! This will be super useful once I figure out how it works. I was hoping you could assist me with this. My apologies, for the silly mistakes that I must have made but I have no experience with coding at all. Please find below the code that I input into the RStudio along with the errors that it returns.
> installed.packages(“readr”,”gtrendsR”, “purrr”)
Error in if (noCache) { : argument is not interpretable as logical
> library(readr)
> library(gtrendsR)
> library(purrr)
>
> kwlist <- readLines(“~/Desktop/Dissertation readings /Data/500words.csv”)
Error: unexpected input in "kwlist
>
> googleTrendsData <- function (keywords) {
+
+
+ country <- c('IT')
+ time <- ("2018-08-01 2018-08-27")
+ channel <- 'web'
+
+ trends <- gtrends(keywords,
+ gprop = channel,
+ geo = country,
+ time = time )
+
+ results
> output
> write.csv(output, “download.csv”)
Error in is.data.frame(x) : object ‘output’ not found
I would greatly appreciate it if you could help me understand the problem!
Hi Vitalii, unfortunately, I wasn’t able to replicate the errors you get. Did you adjust the kwlist <- readLines(“~/Desktop/Dissertation readings /Data/500words.csv”) command with the path of your keywords list?
Hi Ruben,
I was trying to run the program and some errors happened. The error message is
“Error in curl::curl_fetch_memory(cookie_url, handle = cookie_handler) :
Timeout was reached: [trends.google.com] Connection timed out after 10001 milliseconds”
I don’t know what happened. Could you help me with it?
Hi Carol,
I suggest you to delete and reinstall the gtrendsR package.
Dear Ruben,
Thank you so much for this blog post – it is a potential live saver for my research.
I could successfully follow your steps and generate an output.
I am trying to repeat the process with a keyword list in Japanese. I have run the manual, single word command before with Japanese words, so I know it should work. In addition, when I load the csv file and specify the UTF-8 encoding, I can see the correct characters in the ‘Values’ section in Rstudio. Yet when running the ‘output’ command, I receive this error:
Error in interest_over_time(widget, comparison_item, tz) :
Status code was not 200. Returned status code:400
When I load a csv file with English words and keep all other parameters the same, it runs smoothly.
If you would have some spare time, could you perhaps support on this?
Thanks you in advance,
Kind regards,
Hi Bert,
unfortunately I couldn’t replicate the error message you get. I tried to run the script with the word ポケモン and it worked.
Did you update the country argument with “JP”?
country <- c('JP') time <- ("2018-08-01 2018-08-27") channel <- 'web' trends <- gtrends("ポケモン", gprop = channel, geo = country, time = time )
Hi ruben
Is your code still working
Error in FUN(X[[i]], …) :
Status code was not 200. Returned status code:500
After I tried a list of 500 but for next five hundred , it gives this error
It seems that you sent too many requests to Google Trends and they blocked you. Just wait a couple of hours and the code should work again.
Thank you for sharing this. Do you know how can I adjust the command for downloading more than 5 geographic locations at the same time? I tried to read the list of geo from a CSV file but I keep getting error.
I think you should add an extra loop using map_dfr in the main function.
Hi Ruben, thanks for the code! I keep getting this error and was wondering if you could help me with it.
Error in gtrends(keywords, gprop = channel, geo = country, time = time) :
length(keyword) <= 5 is not TRUE
Hi Melissa,
can you please share with me the whole code?
Hi Ruben,
Thanks for the code.
Is there a way to get daily trends and not hits?
I would like to find out how many times was a keyword was searched on a daily basis.
Great job!
Than I think you are using the wrong tool. If you need searches you shouldn’t use Google Trends but something like Google Keyword Planner, Semrush or Ahrefs (but none of them will give you daily searches but only monthly searches).
Hi!
It is possible to use python instead of R, do we have the same package/library?
Thanks!!
Yes, it is for sure possible. Unfortunately, I don’t use Python so I don’t know the name of a good package/library 🙁
how can I run this in parallel ? fetch trends data in parallel manner to make it fast?
I wouldn’t do that. Google will ban you if you make many requests in a short amount of time.
Hey! thank you so much for making this available. I have used it for a while but now it seems to not be working anymore. Do you know why? The error I get is this:
“Error: No data returned by the query. Consider changing search parameters.”
Thank you
Hi Janaina, maybe your query it’s very specific. Did you try to get the data for a popular search term?
Hello!
I am trying to download interest by country data and get the following error “widget$status_code == 200 is not TRUE”. Do you know what might be the issue?
Thank you!
Yes, if you send too many requests at the same time Google Trends will block you. You should try to send less requests or use a VPN proxy.