How to Filter Spam, Bots and Crawlers Efficiently in Google Analytics

Google Analytics is perhaps the single most important tool for business owners with a digital presence. The platform allows the owner to understand traffic sources, user behaviour and conversion characteristics. This data is invaluable and can be used (should be used!) to inform business decisions and overarching marketing strategy.

However, there are a worryingly high number of businesses using Google Analytics as standard currently, without the use of filters. Without certain filters, your website analytics data can be skewed by bot spam traffic, providing inaccurate results that don’t reflect the true performance of your site.

Let’s delve into the varying types of spam and the filters available to you in order to help prevent it.

What do we mean by Spam, Ghost, Bot, or Crawler traffic?

Every year, spammers get more and more advanced in their targeting of unsuspecting victims. In 2015, referral spam became more commonplace, with a number of spammers targeting random Google Analytics accounts in the hopes of driving traffic to their own dodgy websites. This was done by artificially increasing the traffic that site owners saw in their Google Analytics account, urging them to discover where this new referral traffic was originating from.

Since the outbreak in 2015, Google has been working hard to eliminate these spammy sites from your Google Analytics data. These efforts, for the most part, have worked, with a noticeable decrease in spammy sites showing in Google Analytics accounts.

However, to this day, many websites still report seeing ghost traffic or results in their Google Analytics data produced by spam sites. Infact, there was a widely reported spam hit on the 31st of January that impacted a number of Google Analytics accounts. If you want to ensure your data is as accurate as possible, then this spam/bot traffic must be filtered out in Google Analytics.

Firstly, let’s take a closer look at the different kinds of spam. Spam traffic in Google Analytics can often be categorized as one of two options: ghost or crawler.

Ghosts

This is the most common form of spam referral traffic that you’ll find in Google Analytics accounts. It’s known as ‘ghost’ traffic because it requires no real interaction with your website. Whilst we’re used to understanding Google Analytics as a tool to track physical interactions with our websites, ghost traffic is able to bypass this stage.

Instead, spammers utilise the Measurement Protocol, a process which allows developers to make HTTP requests, sending user-activity data directly to Google Analytics’ servers. By using the Measurement Protocol and randomly generated UA codes, spammers are able to send fake traffic data to unsuspecting Google Analytics accounts.

Crawlers

Unlike ghosts, crawlers are actually somewhat legitimate, though they can still be spammy in nature. These crawlers do physically access your website, often crawling pages for certain data. Upon exiting your site, due to their physical access and override of your robots.txt file, they will be reported in your Analytics as a legitimate visit.

There are a number of crawlers out there, some more spammy and malicious than others. Best practise is to check your referral traffic regularly, taking the time to research any suspicious sources that may crop up.

Things to consider before implementing filters

Filters can have a major impact on your future analytics data if entered incorrectly or impartially. For this reason, it's important to consider the below before committing to the implementation of filters.

Create an unfiltered view

Creating an unfiltered view in Google Analytics is considered best practice when creating a Google Analytics account and certainly before implementing any changes.

An unfiltered view acts as a raw copy of your website data, which is extremely useful as the basis for comparison against other views, as well as acting as a safety net should anything go wrong.

Filters don’t work retrospectively

Any filters you apply now can only apply to future website data. Historical data would remain unchanged in the introduction of a new filter.

For this reason, it’s important to implement these filters as early as possible. That way, your business can start benefiting from accurate data.

Data is permanently changed by filters

Filtered data is not recoverable in Google Analytics. For instance, if you implement a filter and at a later date realise that you made a data-entry mistake, this mistake could cost you months worth of valuable analytics data that cannot be recovered.

How to filter Spam, Ghost and Crawler traffic from your Google Analytics data

There are two major filters that can be used to filter out referral spam from your Google Analytics data - a valid hostname filter and a crawler spam filter. We’ll walk you through the step-by-step setup process for both.

However, to save time and avoid repetition, we’ll firstly outline the common first steps of both.

Shared first steps

  1. Navigate to the admin section of your Google Analytics
  2. Look for the far right column which displays ‘View’ properties.
  3. Select ‘Filters’, just below the ‘content grouping’ option
  4. The admin page will be minimised beforing displaying the filter page. Select the red ‘Add Filter’ button.
  5. Follow the configuration below of your desired filter

It should be noted that in order to make edits to filters, you will need edit permissions at the account level. If you do not currently have these permissions, you will need to ask your admin to grant them to you which can be done through the ‘User Management’ section.

Valid hostname filter

If there’s one filter to have in your Google Analytics View, it’s this one. This filter is the best way to filter out ghost spam affecting your website analytics. Ghost spam requires no real interaction with your website. Instead, spammers ping randomly selected UA tracking codes with traffic, usually using a fake or unassigned hostname.

It is this flaw that allows us to counteract ghost spam. By validating which hostnames to include and/or exclude in Google Analytics, we’re able to tell Google to ignore traffic and data from certain hostnames.

Identify your hostnames

What is a hostname and how do I find them I hear you asking?

Essentially, a hostname is any website domain where your GA tracking code (UA-0000000-1) is present. Most obviously, this would include your website. However, it may also extend to development sites, staging sites and any third-party sites you may utilise for analytics purposes. Importantly, valid hostnames can also include Google’s own translation services, as well as Google’s Web Light, which is designed to speed up webpages on mobile devices.

Your hostname information can be found in your hostname report. Simply navigate to Audience > Network > Hostname as documented below.

Whilst searching for your hostnames, it’s worth setting your date range to at least a year. This way, you’re able to get a thorough understanding of the hostnames associated with your website - even if they haven’t sent any traffic to your website in a while.

Screenshot of valid hostname report

Create a filter using your hostname data

Head back over to the Add Filter screen (if you’ve forgotten, just refresh your memory by scrolling up). Click Add Filter and name the filter something relevant, such as “Valid Hostname Filter”. You’ll want to select Custom instead of predefined under filter types, then select Include, and Hostname after that from the dropdown menu for Filter Field.

The filter pattern box is where you’ll enter your valid hostnames. You want to enter your valid hostnames in the following format:

        yourdomain.com|www.yourdomain.com|hostname3|hostname4

Screenshot of a valid hostname filter


You can only create one version of this filter. If you were to create 2 separate filters, it would exclude all data. For this reason, it's important you try to fit all of your valid hostnames into one filter. 

Verify the filter

Before saving your filter, it’s worth taking the time to verify it first. The Verify Filter option at the bottom of the page can be used to check your filter and ensure it's accurate. It does this by showing you how your chosen filter fields would affect data over the past 7 days.

It’s worth noting that if you don’t have 7 days worth of data, this process won’t work.

Crawler spam filter (Campaign Source)

This spam does physically visit your site and in doing so, will leave a valid hostname. For this reason, our previous filter won’t have an effect on this spam traffic. Instead, we’re going to create a separate filter that specifically targets crawler spam.

Search your analytics for crawler spam

To start, identify the crawler spam that shows up in your analytics now. In the Acquisition menu, choose All Traffic, then Referrals. It’s best practise here to set your date range to at least a year, that way ensuring you’re identifying any past sources of crawler spam as well.

If there are any websites that you don’t recognise or seem suspicious, try Googling them for more information. Chances are if they are spam, others have been affected too.

Include common crawler spam lists

Due to the impact of spam on website analytics, there are a number of community created crawler spam lists already in existence. These pre-created filters list the popular offenders often seen in Google Analytics accounts around the world.

By including these popular offenders, you’ll be protecting your website against any future attack.

Create your crawler spam filter

Head back to the Add Filter screen and select Add Filter. Again, name your filter something relevant such as “Crawler Spam Filter”. Again, choose Custom instead of Predefined under filter types, then choose Exclude. Under the Filter Field dropdown, select Campaign Source.

For the pre-created filters you find, you can simply copy-and-paste them into your Google Analytics. For any you manually create, use the same format you did for your hostname filter:

     Spamname|spamname2|spamname3

Screenshot of a crawler spam filter


Unlike the previous filter, you’re able to have multiple versions of this one. Therefore, don’t worry if you have to make 3 or 4 crawler spam filters in order to cover all spam websites.

Test your filter

As before, it’s worth taking the time at this stage to verify your filter. This is always best practise as it ensures you’re configuring your filter correctly and that your future results will be accurate.

Conclusion

Google Analytics is perhaps the most important tool for any business looking to manage and enhance their online presence. Whilst you can’t protect your website from every spammer, you are able to mitigate the impact in which they can have.

Applying the right filters to your website analytics will rob spammers of their power, and give you back the accuracy you need to inform future business decisions and overarching marketing strategy. 

Looking to invest into SEO in 2021?

BarkWeb can conduct a full audit of your website and enhance its optimisation, to ensure your website is ready to make you money in 2021. To learn more, call our experts on 01323 735800, or email enq@barkweb.co.uk. Alternatively, you can fill in our contact form and our team will get back to you.

Get in touch