So I recently got dragged into multiple discussions about Google Analytics referrer spam and ways to effectively block it so that report data is clean and that channel attribution makes sense. In this (long) post, I’d like to share a method I’ve put together to exclude Google Analytics referrer spam, which requires enlisting the help of Google Tag Manager.
What is Google Analytics referrer spam?
First, let’s make sure we are all on the same page. Google Analytics referrer spam happens when “hackers” (sic) try sending data to your Google Analytics account using the Measurement Protocol, aka Google Analytics’ data collection API. By using the Measurement Protocol, one can send fake page views to your GA property by using your UA-XXXXXX-Y tracking ID, supercharging it with referrer information for low quality sites. Just to be clear, this is NOT actual traffic to your website. These are fake traffic requests sent by a bot of some kind. Fake or not, they’re going to show up in your Referral sites report if you look in Acquisition > All Traffic > Referrals.
Also, before you get too emotional, just know that your site is not being targeted on purpose: the spammers are merely generating UA-XXXXX-Y property IDs and sending fake traffic requests. Nothing personal but still very annoying.
So what can I do to combat Google Analytics referrer spam?
Well, you can start with the essentials: go to your view settings in the admin interface, scroll down to the Bot Filtering section and check the “Exclude all hits from known bots and spiders” This is a best-effort initiative by Google to fix the issue of Google Analytics referrer spam but it still leaves much to be desired and new fake referrers pop-up every day so this feature cannot be a long-term solution.
OK, what about filters?
Granted, you can create filters to exclude referrers. Go to the admin section, find your view and go to Filters.
The first kind of filter you can setup is (surprise) a referrer-based exclusion filter. Add a filter, select Custom type, Exclude then Referrer and start typing referrer domain names in a regular expression format so that you can handle multiple referrers. If you’re really into self-inflicted pain, go ahead and create individual filters for each referrer domain you wish to exclude.
For each filter, the text in the filter field should look a lot like:
Even if you’re able to create fully-loaded regular-expression-based filters, you will be faced with the issue of filter maintenance: when new referrer domains show up, you’ll have to create new exclusion filters and ideally delete old ones. Furthermore, if you actually use filters for data collection purposes and not just filtering Google Analytics referrer spam, you might find yourself overwhelmed with the number of filters you have to manage.
The second kind of filter is a hostname inclusion filter, which should take care of most referrer spam because spammy traffic in your GA account is sent using the Measurement Protocol and as such does not have a hostname. Therefore, adding a filter that includes only traffic to hostname mysitedomain.com should in principle suffice. But apparently, it is not 100% efficient.
Filters are therefore a fairly short term solution, if only because of the maintenance.
Wait, can’t I automate filters?
Yes, you can go that route, and there are services such as Referrer Spam Blocker.
Referrer Spam Blocker offers a great service: connect their platform to your Google Analytics account, select views you want to protect against Google Analytics referrer spam and they will automatically add exclusion filters to your view(s). Sweet! Right? Right?
You know what’s coming, I get to be my analytics curdmudgeon. As much as I like Referrer Spam Blocker, I cannot use the solution for many of my enterprise clients who have strict rules on how SaaS services such as Google Analytics can be accessed by third-party services, even if they use standard Google Analytics services such as the Management API.
Right now it’s a free service, but like anyone, they’ll fall victim to their own success and will have to implement a pricing model eventually. Which is fine, I suppose.
There is also the issue of filter management: when I have to manage actual filters, I don’t want to wade through pages of RSB filters to get to the filters I need to manage.
If you have fewer issues than me, by all means, go use Referrer Spam Blocker 🙂
One filter to rule them all
Actually, there is one filter you can use that will take care of your spam problem right away.
Introducing the hostname inclusion filter!
Create a filter that only includes traffic to pages actually sitting on your web site by going to your Google Analytics administration console and go to your view settings. From there add a new filter with the following settings: (see image below)
Filter type: predefined / Include only / traffic to hostname / equals to (or contains) yourdomainname.com
Apply the filter (you may need to re-order it in your filter list) and voilà! No more spam 😉
This will last as long as there are no ways to spoof a hostname using the Google analytics Measurement Protocol.
Filters are a great short-/mid-term solution but as you imagine, I’m about to go for the bigger-picture, longer-term solution 😉
Introducing data collection keys
Ok so bot filtering is not so hot, filtering only gets you so far and automated services may not be reliable in the long run. What’s left? Why, data collection keys of course!
In this method I’m going how to essentially tell Google Analytics the “password” to my analytics property every time traffic flows in. A Google Analytics filter will then include only pages that mention said password when triggering a page view.
But first some preparation!
For this method, I am using Google Tag Manager to make things easier to manage (’cause I’m lazy) but you can definitely implement it with “manual” tagging.
The first step is creating a reserved Custom Dimension so go to the admin and edit your Google Analytics property. Scroll down to Custom Definitions and then Custom Dimensions.
Create a new dimension named “Analytics key” with a scope of ‘hit’. Creating the custom dimension will also show you code examples on how to implement your new dimension. Make a note of the custom dimension’s index (a number between 1 and 20 – or 200 if you use Google Analytics 360). In my case, this is Custom Dimension #8.
Next, select a value for your key. In my case, because I like to be un-necessarily fancy, I use a Base64-encoded string. If you are so inclined, go to Base64Encode.org, type a string and you will receive an encoded version of said string.
But let’s not too carried away. Let’s go with a key with a value of “ilovefluffyatomickittens“.
Make a copy of that key somewhere, we’ll need it in a moment.
Next, go to your Google Tag Manager account and create two variables:
- Analytics key CD index
- Type: Constant
- Value: the index for the custom dimension we created earlier (8 in my case)
- Analytics key CD value
- Type: Constant
- Value: the value for the key (ilovefluffyatomickittens)
Next, locate your Google Universal Analytics tags in the Tags section of the GTM interface and add those two variables as the key and value for a custom dimension in the pageview call.
Save your tag, publish your container and now each Google Analytics tag sends along a custom dimension that contains your “password”.
Now let’s tell Google Analytics to listen to said dimension and include only traffic that sends that dimension. Go to Admin > View > Filters and create a new filter that includes only hits where Custom Dimension #8 (Analytics Key in my case) contains “ilovefluffyatomickittens” or whatever you set your key to.
IMPORTANT: if you have other filters in your view, make sure this new filter is high up in the filter order so that it gets executed early. Remember that Include filters are “include only”, which means that any combination of Include filters with use the most restrictive filter.
There! Your traffic is now being filtered and Google Analytics now only retains traffic where your customs dimension is set and matches your secret key. Fluffy. Atomic. Kittens. You just read that last bit out loud, admit it.
About this method:
- Unique method, does not get affected by spammers
- Secure and reliable: your key is your own and only pages YOU and trusted third parties implement will get measured.
- Does not require technical knowledge
- GTM means flexibility: easily update variables to change your key.
- GTM provides some degree of obfuscation with the container code
- Only 1 filter to manage
(and one filter can be shared with many views in the same property or account)
- No third-party service required
- Requires adjusting existing tags in GTM
- Still requires some degree of filtering
As you can see, this method ensures top notch filtering for longer-term analytics. Of course there is always going to be some cat-and-mouse tactics between spammers and digital marketing solutions but the method I showed here provides you more than enough of a lead.
In this post, we covered the multiple methods you can use to filter out Google Analytics referrer spam. There is no such thing as 100% IT security (and everyone telling you otherwise is trying to sell you something) and the same holds true of websites and apps.
Looking back at my method, which I like to find more elegant and definitely more efficient than the alternatives, I wish that Google would provide such a secret key on top of the property tracking ID, so that such a verification mechanism would ensure proper data collection.
What about you? How do you fight Google Analytics referrer spam? Let me know in the comments or on social media!