I know that many marketers and digital analysts are using Google Analytics almost every day and that’s why I decided to write this guide about an R package called googleAnalyticsR.
The package has been developed by Mark Edmondson (he also developed the searchConsoleR package) and it basically allows you to use the Google Analytics API in R.
I’m using the package since a couple of months and I decided to put together all my notes in this blog post. I’m pretty sure that you’ll love it!
Install the googleAnalyticsR package
In this guide, I won’t talk about how to install R because there are thousands of tutorials available online. As a first step, you have to install and load the googleAnalyticsR package.
1 2 3 4 |
# Install and load the googleAnalyticsR package install.packages("googleAnalyticsR") library("googleAnalyticsR") |
Authorize googleAnalyticsR to access your data (OAuth2)
After you successfully loaded the package, you have to connect R and Google Analytics API. You have to use the function ga_auth() to launch the authentication process.
1 2 3 |
# Connect R and Google Analytics ga_auth() |
The command will open a browser tab where you’ll have to log in with your Google account and allow googleAnalyticsR to have access to your Google Analytics data using the API.
If you followed all the instructions, you should see the message “Authentication complete. Please close this page and return to R.”
Your Google Analytics accounts & View IDs
Before you can write your first API query, you have to choose the View which holds the data you want to access and get his View ID.
There are two ways to do that. You can open the Google Analytics interface and go to Admin > View > View Settings > View ID or you can use a function of the package called ga_account_list().
1 2 3 4 |
# Generate a list of all the Google Analytics accounts you have access to ga_accounts <- ga_account_list() View(ga_accounts) |
Below you can see the data frame which contains all the information you need.
Your first query
Now you are finally ready to write your very first query! Let’s analyze the different arguments of google_analytics function:
1 2 3 4 5 6 7 8 |
# ga_id contains the View ID that you want to query ga_id <- 123456789 # Download the data and store them in a dataframe ga_results1 <-google_analytics(ga_id, date_range = c("2020-01-01", "2020-01-31"), metrics = c("users","sessions"), dimensions = "date") |
ga_id – contains the View ID you pick in the previous step
ga_results1 – the data will be stored in this data frame (but it’s not mandatory!)
google_analytics – is the function where you’ll define all the different conditions (arguments) of your query (View ID, data range, metrics, dimensions, filters, segments,…)
This is the output of the query:
Here you can find the list of all the metrics & dimensions available in the Google Analytics API. You just have to copy the name without the ga: prefix. e.g. ga:deviceCategory -> deviceCategory
Date Ranges
There are multiple ways to set the date range of your query:
- Use the YYYY-MM-DD format
- Use Sys.Date() – days . Sys.Date() returns the date of today
- Use the API v4 shortcuts today, yesterday, XXdaysAgo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Date format YYYY-MM-DD ga_results1 <-google_analytics(ga_id, date_range = c("2020-01-01", "2020-01-31"), metrics = c("users","sessions"), dimensions = "date") # Date using the Sys.Date() function twelveDaysAgo <- Sys.Date()-12 yesterday <- Sys.Date()-1 ga_results1 <-google_analytics(ga_id, date_range = c(twelveDaysAgo, yesterday), metrics = c("users","sessions"), dimensions = "date") # Date using the API shortcuts today, yesterday, XXdaysAgo ga_results1 <-google_analytics(ga_id, date_range = c("12daysAgo", "yesterday"), metrics = c("users","sessions"), dimensions = "date") |
Change the rows limit
googleAnalyticsR by default returns only 1000 rows, but you can change it using the max = argument.
If you want to download all the rows you have to use max = -1 .
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Download only 1000 rows ga_results1 <-google_analytics(ga_id, date_range = c("2020-01-01", "2020-01-31"), metrics = c("users","sessions"), dimensions = "date") # Download 4000 rows ga_results1 <-google_analytics(ga_id, date_range = c("2020-01-01", "2020-01-31"), metrics = c("users","sessions"), dimensions = "date", max = 4000) # Download all the rows ga_results1 <-google_analytics(ga_id, date_range = c("2020-01-01", "2020-01-31"), metrics = c("users","sessions"), dimensions = "date" max = -1) |
Avoid data sampling
As you probably know, if you request a large amount of data you’ll get sampled data. In googleAnalyticsR you can avoid that by using the argument anti_sample = TRUE.
The package will automatically split up the calls in order to return unsampled data (the process might take a bit more time).
1 2 3 4 5 6 |
# Query with anti sampling parameter activated ga_results1 <-google_analytics(ga_id, date_range = c("2020-01-01", "2020-01-31"), metrics = c("users","sessions"), dimensions = "date", anti_sample = TRUE) |
Filters
Let’s now talk about filters. You can add filters to your query in order to get only the data you really need.
Create dimension filters with dim_filter
If you want to use a dimension filter you’ll have to use the dim_filter. This function has 5 arguments that can be defined:
- dimension – the dimension name (e.g. city, country, deviceCategory,…)
- operator – how to match the dimension (REGEXP, BEGINS_WITH, ENDS_WITH, PARTIAL, EXACT,…)
- expressions – what should match the dimension (e.g. Milan, Mobile,…)
- caseSensitive – TRUE/FALSE if it’s TRUE than the expressions is case sensitive
- not – TRUE/FALSE e.g. if you want to say DOES NOT CONTAIN than you should combine operator PARTIAL with not = FALSE
After that, you have to wrap the different dimension filters inside the filter_clause_ga4 function.
In this function, you also have an argument called operator. You can basically define a logical operator (AND, OR) which will be used to combine the different conditions you defined in the filters. e.g. if you have two dimension filters and decide to use the AND operator, the Google Analytics API will return only the data that are matching the two filtering condition at the same time.
Last thing, you have to add the dim_filters argument in your google_analytics function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
# Create dimension filters # dimension1 = city CONTAINS "Mila" # dimension2 = deviceCategory DOES NOT BEGINS WITH "Mob" dimension1 <- dim_filter("city","PARTIAL","Mila", not = FALSE) dimension2 <- dim_filter ("deviceCategory","BEGINS_WITH","Mob",not = TRUE) # Combine the two dimension filters dfilters <- filter_clause_ga4(list(dimension1,dimension2), operator = "AND") # Download the data and store them in a dataframe data_with_filters <- google_analytics(ga_id, date_range = c("50DaysAgo", "yesterday"), metrics = c("pageviews","sessions"), dimensions = c("deviceCategory","city"), dim_filters = dfilters, max = -1) |
Create metric filters with met_filter
For metric filter you have to use met_filter. There are 5 arguments available in this function:
- metric – the metric name (e.g. sessions, bounces, transactions,…)
- operator – how to match the dimension (EQUAL, LESS_THAN, GREATER_THAN, IS_MISSING)
- comparisonValue – what should match the metric (e.g. 1, 100, 1000,…)
- not – TRUE/FALSE
After that, you have to wrap all the metric filters you created inside the filter_clause_ga4 function.
Then you have to specify the argument met_filters in google_analytics.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
# Create metric filters # metric1 = pageviews NOT GREATER THAN 5 (LESS THAN 5) # metric2 = goal1Completions LESS THAN 1 metric1 <- met_filter("pageviews","GREATER_THAN",5, not = TRUE) metric2 <- met_filter ("goal1Completions","LESS_THAN",1) # Combine the two dimension filters mfilters <- filter_clause_ga4(list(metric1,metric2),operator = "AND") # Download the data and store them in a dataframe data_with_filters <- google_analytics(ga_id, date_range = c("50DaysAgo", "yesterday"), metrics = c("pageviews","sessions"), dimensions = c("deviceCategory","city"), dim_filters = dfilters, met_filters = mfilters, max = -1) |
Obviously, you can use dim_filters and met_filters at the same time inside the google_analytics function.
Segments
With googleAnalyticsR you can create segments, but in this guide, I’m going to show you only how to apply a segment that you have previously created through the Google Analytics Interface.
As you did previously (for the View ID), you can generate a data frame with all the details of your segments.
Then you have to copy the Segment ID of your Segment and use it in the segment_ga4 function which allows you to use the segments argument in google_analytics.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# As you did for View ID, you can return a list with all your segments my_segments <- ga_segment_list() # Pick the segment you want to use and copy the Segment ID segment_to_use <- "gaid::rHAUgqF_QuGk1Pu9GbqcfA" # Add your segment ID to the segment_ga4 function segment1 <- segment_ga4("Segment Name", segment_id = segment_to_use) # Query your Google Analytics data using the segment segmented_ga1 <- google_analytics(ga_id, c("2020-01-01","2020-01-31"), dimensions=c('date','segment'), segments = segment1, metrics = c('sessions','users','bounces') ) |
From R to Excel
R is a powerful language, but I also know that it takes quite some time to learn it. And that’s why I decided to add this paragraph, especially for the very beginners. What you can do, is just use R to download the data from Google Analytics and then run the analysis using Excel.
With the write.csv function, you can download the output of your query as CSV and load it in Excel.
1 2 3 |
# Export the output of the query as a csv file write.csv(ga_results1,"download.csv") |
Otherwise, you can continue the analysis in R, using packages like ggplot2 or dplyr.
googleAnalyticsR: practical examples
To wrap up this guide, I want to show you two examples on how you can use googleAnalyticsR.
#1 Visualize Sessions data using ggplot2
In this first example, I used googleAnalyticsR in combination with a data visualization package called ggplot2.
As you can see in the picture above, I created 4 line charts representing the development of the sessions over time, divided by continent and deviceCategory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# Install & load googleAnalyticsR & ggplot2 packages install.packages("ggplot2") install.packages("googleAnalyticsR") library("ggplot2") library("googleAnalyticsR") # OAuth2.0 authentification ga_auth() # Get the Google Analytics account informations ga_account <- ga_account_list() View(ga_account) # Store the View Id value view_id <- # Your View Id # Create the dimension filters dimension1 <- dim_filter("continent","PARTIAL","(not set)", not = TRUE) dimension2 <- dim_filter("continent","PARTIAL","Oceania", not = TRUE) dimension3 <- dim_filter("continet","PARTIAL","Africa", not = TRUE) # Combine the 3 conditions in the filter_clause_ga4 function using the AND operator dfilter <- filter_clause_ga4(list(dimension1,dimension2,dimension3), operator = "AND") # Send your Google Analytics API query request results <- google_analytics (view_id, date_range = c("50daysAgo","yesterday"), metrics = c("sessions"), dimensions = c("date","continent","deviceCategory"), dim_filters = dfilter, anti_sample = TRUE, max = -1 ) # Use ggplot2 to create a line chart for each Continent and Device ggplot(results, aes(x = date, y = sessions,color = deviceCategory)) + geom_line() + facet_wrap( ~ continent) |
#2 Get the Client Id of specific users
In Google Analytics there is a report called User Explorer, where you can isolate the behaviors of every single user(the report uses the Client Id value).
As you can see, the report is huge and it can take a lot of time to find the Client Id you want to examine.
That’s why every time I need that report, I first use the googleAnalyticsR package to get a list with only the Client Ids I really have to check.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# Install & load the googleAnalyticsR package install.packages("googleAnalyticsR") library("googleAnalyticsR") # OAuth2.0 authentification ga_auth() # Get the Google Analytics account informations ga_account <- ga_account_list() View(ga_account) # Store the View Id value view_id <- # Your View Id # Create the dimension filters dimension1 <- dim_filter("city","PARTIAL","Milan", not = FALSE) # City CONTAINS Milan dimension2 <- dim_filter("deviceCategory","EXACT","mobile", not = TRUE) # Device IS NOT mobile dimension3 <- dim_filter("sourceMedium", "PARTIAL","direct", not = FALSE) # Source CONTAINS direct # Combine the 3 conditions in the filter_clause_ga4 function using the AND operator dfilter <- filter_clause_ga4(list(dimension1,dimension2,dimension3), operator = "AND") # Send your Google Analytics API query request results <- google_analytics (view_id, date_range = c("50daysAgo","yesterday"), metrics = c("sessions","bounces"), dimensions = c("clientId","city","deviceCategory","sourceMedium"), dim_filters = dfilter, anti_sample = TRUE, max = -1 ) # Export the results as a .csv file write.csv(results,"results.csv") |
This is a great post. I found it tremendously useful for an assignment! Thank you!!!
Thanks Gene! Happy to hear that 🙂
One of the best-explained examples.
Clear and concise Ruben
Thank you
Thanks! 🙂
Amazing! many thanks 😀
Do you know if there is a way to get all possible metrics?
I tried writing all metrics, but it said that there are only 10. allowed.
You can find all the available metrics/dimensions here: https://ga-dev-tools.appspot.com/dimensions-metrics-explorer/
There is a limit of metrics you can download a the same time, the only way to get more than 10 metrics is to download multiple datasets and merge them.
Super!!
Thank you so much Ruben
You’re welcome!
I ran your code, based on a blog hosted by Netlify, and got an error message. I had looked on Netlify to find my GA id and assigned it to ga_id, as you did.:
ga_results1 <-google_analytics(ga_id,
date_range = c("2020-05-01", "2020-05-12"),
metrics = c("users","sessions"),
dimensions = "date")
Error: API returned: User does not have sufficient permissions for this profile.
Hi Rees,
yes, you have to replace ga_id with your view ID (be careful, the view id it’s a numeric string and you can easily confuse it with the UA-XXXXXXX id that is your property id).
But if you look at the error you get: “Error: API returned: User does not have sufficient permissions for this profile.” it seems that the email address you’re using doesn’t have the rights to access your Google Analytics API data. Are you using the owner email address?
Thanks a lot Ruben for this article! However, is it possible to access the google analytics data of another website for analysis using googleAnalyticsR?
Hi Ifeanyi, no it’s not possible. You can only download the data of your GA accounts. Your Google account must be connected to the GA account you want to access.
This post was great! I used it to write my final paper for a course!
Grazie 1000!
Happy to hear that! Congrats! 🙂