googleAnalyticsR: how to connect Google Analytics with R

April 06, 2020 in Google Analytics

I know that many marketers and digital analysts are using Google Analytics almost every day and that’s why I decided to write this guide about an R package called googleAnalyticsR.

The package has been developed by Mark Edmondson (he also developed the searchConsoleR package) and it basically allows you to use the Google Analytics API in R.

I’m using the package since a couple of months and I decided to put together all my notes in this blog post. I’m pretty sure that you’ll love it!

Install the googleAnalyticsR package

In this guide, I won’t talk about how to install R because there are thousands of tutorials available online. As a first step, you have to install and load the googleAnalyticsR package.

Authorize googleAnalyticsR to access your data (OAuth2)

After you successfully loaded the package, you have to connect R and Google Analytics API. You have to use the function ga_auth() to launch the authentication process.

The command will open a browser tab where you’ll have to log in with your Google account and allow googleAnalyticsR to have access to your Google Analytics data using the API.

googleanalyticsr auth2.0

If you followed all the instructions, you should see the message “Authentication complete. Please close this page and return to R.”

Your Google Analytics accounts & View IDs

Before you can write your first API query, you have to choose the View which holds the data you want to access and get his View ID.

There are two ways to do that. You can open the Google Analytics interface and go to Admin > View > View Settings > View ID or you can use a function of the package called ga_account_list().

Below you can see the data frame which contains all the information you need.

google analytics accounts list

Your first query

Now you are finally ready to write your very first query! Let’s analyze the different arguments of google_analytics function:

ga_id – contains the View ID you pick in the previous step

ga_results1 – the data will be stored in this data frame (but it’s not mandatory!)

google_analytics – is the function where you’ll define all the different conditions (arguments) of your query (View ID, data range, metrics, dimensions, filters, segments,…)

This is the output of the query:

googleanalyticsr query

Here you can find the list of all the metrics & dimensions available in the Google Analytics API. You just have to copy the name without the ga: prefix. e.g. ga:deviceCategory -> deviceCategory

Date Ranges

There are multiple ways to set the date range of your query:

  • Use the YYYY-MM-DD format
  • Use Sys.Date() – days . Sys.Date() returns the date of today
  • Use the API v4 shortcuts today, yesterday, XXdaysAgo

Change the rows limit

googleAnalyticsR by default returns only 1000 rows, but you can change it using the max = argument.

If you want to download all the rows you have to use max = -1 .

Avoid data sampling

As you probably know, if you request a large amount of data you’ll get sampled data. In googleAnalyticsR you can avoid that by using the argument anti_sample = TRUE.

The package will automatically split up the calls in order to return unsampled data (the process might take a bit more time).

Filters

Let’s now talk about filters. You can add filters to your query in order to get only the data you really need.

Create dimension filters with dim_filter

If you want to use a dimension filter you’ll have to use the dim_filter. This function has 5 arguments that can be defined:

  • dimension – the dimension name (e.g. city, country, deviceCategory,…)
  • operator – how to match the dimension (REGEXP, BEGINS_WITH, ENDS_WITH, PARTIAL, EXACT,…)
  • expressions – what should match the dimension (e.g. Milan, Mobile,…)
  • caseSensitive – TRUE/FALSE if it’s TRUE than the expressions is case sensitive
  • not – TRUE/FALSE e.g. if you want to say DOES NOT CONTAIN than you should combine operator PARTIAL with not = FALSE

After that, you have to wrap the different dimension filters inside the filter_clause_ga4 function.

In this function, you also have an argument called operator. You can basically define a logical operator (AND, OR) which will be used to combine the different conditions you defined in the filters. e.g. if you have two dimension filters and decide to use the AND operator, the Google Analytics API will return only the data that are matching the two filtering condition at the same time.

Last thing, you have to add the dim_filters argument in your google_analytics function.

Create metric filters with met_filter

For metric filter you have to use met_filter. There are 5 arguments available in this function:

  • metric – the metric name (e.g. sessions, bounces, transactions,…)
  • operator – how to match the dimension (EQUAL, LESS_THAN, GREATER_THAN, IS_MISSING)
  • comparisonValue – what should match the metric (e.g. 1, 100, 1000,…)
  • not – TRUE/FALSE

After that, you have to wrap all the metric filters you created inside the filter_clause_ga4 function.

Then you have to specify the argument met_filters in google_analytics.

Obviously, you can use dim_filters and met_filters at the same time inside the google_analytics function.

Segments

With googleAnalyticsR you can create segments, but in this guide, I’m going to show you only how to apply a segment that you have previously created through the Google Analytics Interface.

As you did previously (for the View ID), you can generate a data frame with all the details of your segments.

Then you have to copy the Segment ID of your Segment and use it in the segment_ga4 function which allows you to use the segments argument in google_analytics.

From R to Excel

R is a powerful language, but I also know that it takes quite some time to learn it. And that’s why I decided to add this paragraph, especially for the very beginners. What you can do, is just use R to download the data from Google Analytics and then run the analysis using Excel.

With the write.csv function, you can download the output of your query as CSV and load it in Excel.

Otherwise, you can continue the analysis in R, using packages like ggplot2 or dplyr.

googleAnalyticsR: practical examples

To wrap up this guide, I want to show you two examples on how you can use googleAnalyticsR.

#1 Visualize Sessions data using ggplot2

ggplot2 googleanalyticsr

In this first example, I used googleAnalyticsR in combination with a data visualization package called ggplot2.

As you can see in the picture above, I created 4 line charts representing the development of the sessions over time, divided by continent and deviceCategory.

#2 Get the Client Id of specific users

In Google Analytics there is a report called User Explorer, where you can isolate the behaviors of every single user(the report uses the Client Id value).

google analytics user explorer report

As you can see, the report is huge and it can take a lot of time to find the Client Id you want to examine.

That’s why every time I need that report, I first use the googleAnalyticsR package to get a list with only the Client Ids I really have to check.

Write a Comment

Comment