At the time of writing this article, the Search and Replace function is not available in the new GA4 and that is why I decided to write this guide and show you a workaround I’ve recently used.
A couple of days ago, I was analysing a GA4 property and I noticed that it was receiving a lot of traffic from many different referrals.
After some further analysis, I realised that many of those referrals were from the same website but with different subdomains or a port number at the end.
Just to give you an idea of what I mean you can see two examples below:
I had also an extreme case with hundreds of different values for the same referrer.
This is not good, especially in GA4, where it’s important to have a low cardinality (total number of unique values for a dimension) in order to avoid aggregated reports (the (other) row).
GA3/Universal gives you the top 50k rows and aggregates everything after ( row 50,001+). That meant the report was still useful for quickly figuring out the top combinations of anything.
GA4 doesn’t seem to be doing that, so aggregated reports seem to no longer be directional.
β Charles Farina (@CharlesFarina) April 21, 2022
My solution
My goal was to consolidate some referrals for better reporting and lower the risk of data aggregation.
Luckily, page_referrer is one of the fields that you can set in a GA4 tag in order to override the default value.
I used a RegEx Table variable to mimic the Search and Replace filter function. I selected {{Referrer}} as my Input Variable and added some regular expressions and the desired outputs as you can see below.
In my case, I disabled the options Full Matches Only and Enable Capture Groups and Replace Functionality because I just wanted to match my regex anywhere in the value but it might not work in your case, so you should read very carefully the two information boxes.
Afterwards, I updated my GA4 Configuration Tag. I added page_referrer to the Fields to Set list and used my RegExp Table as a Value variable.Β
This solution works with all the fields you can find on this list. For example, you could remove a specific directory from all the page URLs.
Hi Ruben, thanks for sharing this valuable information. It happens that I tried to do the same but using the page_location variable and removing a certain directory from the url and it worked but I have a problem. What happens is that although it converts and replaces the values of the page it affects the attribution of google/cpc and sends the pages in lowercase format without the directory specified in the table regex but it affects the attribution. Do you have any solution so that it converts the pages to lowercase with GTM and removes certain parameters from the directory that it sends to the page path? I’m looking forward to it, thank you very much!!
Hi, sorry for the delay. I just noticed that I missed your comment. My suggestion today is to never change the page_location parameter but to push the “cleaned” url into a custom parameter.
Hi Ruben, thanks for your article. Your last comment is valid for the case you explain. However, if you wanted to apply this to “page_location” and create a new custom parameter to persist the existing data, if you generate more than 500 unique rows per day you are going to have problems with data cardinality in a standard GA4 account and you will surely get the “other” row π the same if you want to use the new parameter that you have created as a new dimension to be able to use it in the standard GA4 reports. The only thing I can think of if you don’t have BigQuery is to overwrite the page_location but of course this has the The disadvantage is that you will no longer have this raw data, but rather it will have been modified.
You’re right, I forgot about the cardinality issue while I was writing the answer. Probably the best solution, as you said, is to not overwrite the page_location and do some manipulation in BigQuery.
In that way, you keep the raw data and don’t hit cardinality issue.
Correct. If you make the modification via BQ you will keep the original data and you will be able to consult the new ones. However, if you do not have BQ, another solution would be to create a parameter that contains the new data but only what you need. Let me explain if this is done so that all the urls are the same and eliminate the location/language of the url, you can create another parameter to host the location/language of the content, this way you will have less than 500 different values ββand you will not have cardinality problems π