21

I have several websites that get daily around 5% of visits from spam referrers. There is one strange things I noticed about this referrers: they show in Google Analytics, but I cannot see them in my custom designed table where I insert all the visitors to the site, so I think that they only manipulate the GA code, never reaching the site itself.

If you follow their link, they redirect you to some affiliates link.

I don't know whether they have impact on my SEO/SERP, but I would like to get rid of them. May I do that via htaccess file?

One peculiar aspect is that I get visitors from different forum like pages. E.g.: forum.topic221122.darodar.com, forum.topic125512.darodar.com etc., so I would like to block the full darodar.com domain.

Besides darodar.com, there are also econom.co and iloveitaly.co that are bothering my stats. Can I block them all from htaccess?

Catalin Marcu
  • 634
  • 1
  • 7
  • 20

14 Answers14

42

Most of the Spam in Google Analytics never access your site so you can't block them using any server-side solution.

Ghost Spam hits directly GA and usually shows up only for a few days and then disappear, that's why some people think they blocked them from the .htaccess file but is just coincidence.

This type of Spam is easy to spot since they use either a fake hostname or is not set. (See image below)

The other type, Crawlers like semalt, actually access your site and can be blocked from the .htaccess file, however, there are just a few of them.

So in summary, to stop spam in Google Analytics:

  • Crawlers: server-side solutions or filters in GA
  • Ghosts: ONLY filters in GA

The only efficient solution to prevent being hit by ghost spam is by making an include filter with all your valid hostnames.

First you need to make a REGEX with all the valid hostnames, something like this (you can find them on the network report)

yoursite\.com|shoppingcart\.com|translateservice\.net

These are some examples; you might have more or fewer hostnames. Once you have the REGEX, follow the same steps as above and change this:

  • Go to the admin tab in Google Analytics
  • Select FILTER under the View Column > New Filter
  • Filter type Custom > Include > Filter Field Hostname
  • File Pattern Copy the hostname expression you built

For Crawlers you will have to create a different filter building an expression with all spammers

spammer1|spammer2|spammer3|spammer4|spammer5
  • Filter type Custom > Exclude > Filter Field Campaign source
  • File Pattern Copy the referral expression

Everytime you work with filters it is important that you keep an unfiltered view.

If you need detailed steps for this solutions you can check this complete guide about Spam in Google Analytics.

Guide to stop and remove All the spam in Google Analytics

Hope it helps.

Hostname report Example valid hostnames

Community
  • 1
  • 1
Carlos Escalera Alonso
  • 2,333
  • 2
  • 25
  • 37
  • 7
    I really wish Google would address this....been getting a ton of this sh*t lately...I add at least 1 new one daily, sometimes more. – user_78361084 Apr 12 '15 at 01:18
  • The main problem is that I have 20 sites in my google analytics and discovered more than 20 referrer spam hosts. Is there any change to do the trick for one site and copy it to the others? – Catalin Marcu Apr 13 '15 at 09:55
  • The best way to avoid referrer spam for now is using a **valid hostname filter**, unfortunately, there is no way to pass from one site to another since each site has it's own. I recommend you to start doing at least for your sites with tracking id ending in 1(and probably 2) UA-XXXXX-1 since this are the most affected, It may take some time, but **you will avoid most of the spam**. You can check a detailed guide in the link I provided above – Carlos Escalera Alonso May 04 '15 at 11:45
  • Great stuff, this really should be the accepted answer! One thing I don't understand about the "valid hostname filter" approach though - if I'm a spammer, what's to stop me from reporting say *translate.googleusercontent.com* as the host name, thus bypassing the filter? – Ohad Schneider May 17 '15 at 21:08
  • Also the "valid hostname filter" won't stop crawler spam right? I mean if the spammer is actually crawling (=visiting) your site, the hostname will be legit will it not? – Ohad Schneider May 17 '15 at 21:34
  • And last but not least, couldn't a spammer who's not blocked in the `.htaccess` file fetch your page, parse your tracking ID, record the hostname and retain that mapping to intelligently ghost spam you (even if you block his crawler later), again bypassing the "valid hostname" filter? – Ohad Schneider May 17 '15 at 22:29
  • @Ohad you are right about those things, first you will need another filter just for crawler spam or if you are comfortable with .htaccess blocking the crawlers there will be even better. As for the way the spammer attacks well this has been an ongoing battle since the internet exists, people take advantage of weaknesses of some program or service and once that weakness is stopped they will find another way to attack, and once they find it more people will follow. But getting back to the topic as for now the best way to stop spam I've found is this – Carlos Escalera Alonso May 21 '15 at 21:01
  • @carlos thanks for confirming. It's a good thing I don't work for spam companies :) Hopefully hosting services (such as Azure which I'm using) will have a blacklist of their own to help mitigate such issues and others. – Ohad Schneider May 22 '15 at 06:58
  • sorry if this is a stupid doubt but what would happen if you took your tracking ID ,generated an RSA private and public key pair for your server so you send the encrypted id to the browser and then right in the browser decrypt it to get the actual id to initialize tracking, these spam bots rely on extractng regexes dont they, you dont need to encrypt anything on the server, just maintain the keys in an env and send the encrypted tracking ID to the browser – PirateApp Aug 17 '19 at 09:22
  • 1
    @PirateApp that would help in cases like the one you describe, a bot scrapping UA- codes, however, I think since the GA code had a standard format, spammers were just generating UA codes randomly, in those cases, encrypting your code wouldn't help. In any case, the spam now is a lot less frequent than 2 years ago. So unless you are getting a direct attack I recommend you to spend your efforts in other sources of junk traffic, like internal traffic or bots which are a much larger problem today ;) – Carlos Escalera Alonso Aug 20 '19 at 08:06
12

This blog post suggests that the spam referrers manipulate Google Analytics and never actually visit your site, so blocking them is pointless. Google Analytics offers filtering if you want to mitigate fake site hits.

ab7
  • 201
  • 2
  • 5
  • 1
    GA Filtering is total BS. It only gets rid of so-called "known" robots and spiders. The unknown ones are the ones that wreak the most havoc. – 3Dom Aug 30 '15 at 09:39
  • @3Dom You're thinking of the 'known bots and spiders' option, whereas the answerer is referring to the possibility of filtering traffic at the View level that doesn't (for example) match your hostname. – J Brazier May 27 '16 at 12:47
6

Yes you can block with .htaccess and actually you should do it.

Your .htaccess file could look like this:

<IfModule mod_setenvif.c>
# Set spammers referral as spambot
SetEnvIfNoCase Referer darodar.com spambot=yes
SetEnvIfNoCase Referer 7makemoneyonline.com spambot=yes
## add as many as you find

Order allow,deny
Allow from all
Deny from env=spambot
</IfModule>

When traffic comes from these sites, they are blocked with this .htaccess, so the HTML is never loaded and therefore GA script is not fired up (from these sites).

They try to collect traffic from you, once you see the incoming traffic in Google Analytics then trying to find out what is the source you go to that URL. It is harmless to your site, except your statistics are full of junk data.

Google Analytics should prevent this, the same way GMail prevents spam email.

Carlos Morales
  • 5,676
  • 4
  • 34
  • 42
  • 1
    It's worth mentioning that these sites, don't actually visit your site, they actually just hack your GA script. So blocking them via htaccess isn't always fool proof – Gary Woodfine Mar 03 '15 at 10:16
  • @GaryWoodfine it is not bullet proof, you are right, but they still need to load the HTML to know which assigned tracking ID you have. This strategy is definitely working for me. – Carlos Morales Mar 05 '15 at 16:16
  • Nothing wrong with your solution. However to have a total belt and braces approach to eradicating these guys from your results you'd need to cover both GA and htaccess file. They don't load the HTML from your site, it's a fairly trivial programmer task to extract the basic GA code, and loop through a series of account ID spoofing visits, which is in effect what they do. I published a blog post with instructions on how to combat these guys http://garywoodfine.com/blocking-referrer-spam/ – Gary Woodfine Mar 06 '15 at 09:07
  • 1
    Most of the Referrer spam use a vulnerability in GA as @Gary Woodfine says. These spammers don't know who are they targeting they just target random tracking ID's. **.htaccess will only work for crawlers like semalt**, but not for most of them like darodar. I explained with more detail http://stackoverflow.com/a/28354319/3197362 – Carlos Escalera Alonso Mar 29 '15 at 00:53
3

According to this entry, they are never visiting your site, they are faking HTTP request to GA using your UA-code. So, it seems it's pointless to block them using .htaccess or any other method, because they never actually enter to your site, they are only sending fake "visit" data to Google.

nikoskip
  • 1,860
  • 1
  • 21
  • 38
2

We have found that using htaccess is a good way to stop these spams. I have implemented below solution on my clients site which is working really well so far. Best way is to stop them by contains clause, e.g. spam priceg.com check for priceg in referrer url.

Because many of these sites are creating sub domains and re hitting and when they tweak the url, hard coded conditions fail

RewriteCond %{HTTP_REFERER} (priceg) [NC,OR]
RewriteCond %{HTTP_REFERER} (darodar) [NC,OR]

It is explained in detail here

Sandy
  • 51
  • 3
  • After implementing the .htaccess file above, I noticed that now Google Analytics tracks the hits with no bounce tracking. win win! – iconMatrix May 19 '15 at 13:08
1

apparently, this is done by a spammer by communicating directly with google analytics using your website's account ID. So they effectively tell google analytics they visited your page while in fact they never did. They identify themselves to analytics by means of an URL which THEY WANT YOU TO VISIT. So you see their traffic in google analytics and go check them out. They will have an amazon affiliate account hooked up and so they attempt to get a commission from your amazon purchases, for example.

so .htaccess did nothing for me when I was fighting this one; you need to create a filter which filters out things like (.*)/.darodar/.com

the real bad effect I have found from this is it invalidates my website statistics

0

You can restrict access use .htaccess or by filtering ALL robot visits from being tracked by Google Analytics. If that doesn't work, setup Google Analytics filtering. More details on how to do that can be found here: http://www.wiyre.com/google-analytics-darodar-forum-spam-what-is-it/

They are Russian based but routing their spiders through China and the Philippines. Maybe it would be best to block the whole IP address at this point, they have multiple sub-domains.

0

I used these mod_rewrite methods for semalt:

RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?semalt\.com.*$ [NC]
RewriteCond %{HTTP_REFERER} ^http(s)?://(.*\.)?semalt\.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com\ [NC,OR]

or with the .htaccess module mod_setenvif

SetEnvIfNoCase Referer semalt.com spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.11\.15" spambot=yes
SetEnvIfNoCase REMOTE_ADDR "217\.23\.7\.144" spambot=yes

Order allow,deny
Allow from all
Deny from env=spambot

I even created an Apache, Nginx & Varnish blacklist plus Google Analytics segment to prevent referrer spam traffic, you can find it here:

https://github.com/Stevie-Ray/referrer-spam-blocker/

Community
  • 1
  • 1
0

Filter future and historical ga spam of all types with the link provided. Hostname filtering is particularly easy.

https://www.ohow.co/ultimate-guide-to-removing-irrelevant-traffic-in-google-analytics/

Jan S
  • 408
  • 6
  • 15
0

Blocking any bots at your web server level makes no sense - spammers are sending fake requests to Google Analytics web server. All they have to know is website domain name and Google Analytics ID linked to it. So you have to mask your Google Analytics ID at website code. For example, you can do like this at Google Analytics JS code:

ga('create', 'UA-X' + 'XXXXX' + 'XX-X', 'auto');

Spammer's bot should be able to execute JS code to parse your Google Analytics ID after this change (and not so many bots will be able to do it).

https://nobodyonsecurity.com/security/fighting-google-analytics-referrer-spam

Alek
  • 21
  • 3
0

.htaccess is not the best way. In my site I use GA, The option tracking information and then Reference exclusion list.

Regards!

davidcm86
  • 31
  • 5
0

Lunametrics posted a nice article to solve this issue using Google Tag Manager: http://www.lunametrics.com/blog/2014/03/11/goodbye-to-exclude-filters-google-analytics/

  • I use GTM and found that article right before finishing this thread and it is one of the most useful because it explains how to block the ghost referrers that never visit the site and are unaffected by `.htaccess` directives. – adam-asdf Apr 06 '15 at 17:24
  • Good to hear that! I'm testing using a filter on a view for a few days, then I'll try the GTM method and see what happens. I think that I'll be using the later for good. – Gabriel Ravarini Apr 30 '15 at 22:02
  • The link no longer valid – JeeShen Lee Sep 11 '21 at 01:29
0

I think that the most effective way to avoid ghost spam is to add a custom dimension that let you know the site was indeed visited, because as we know they never visit the site.

ga('set', 'dimension1', "Hey I'm really here!!");
ga('send', 'pageview');

You should simply add this lines in your pages and then add a filter to "include" only when the dimension has the expected value ("Hey I'm really here!!") in this case

Diego
  • 221
  • 1
  • 4
  • 12
0

2019 update

I may have a solution to this problem as I find none of the other solutions to be effective.

Let me address the problems of the existing solutions first

  1. Add a filter for each referrer spam domain.
  2. How many domains will you add?
  3. Most of these referrer spam domains exist for sometime and then disappear
  4. Maintain a blacklist of referrer spam domains.
  5. This gets even more complicated as they are basically endless in numbers.
  6. You would have to keep updating the blacklist.
  7. Also bigger the blacklist, the more time you need to scan it
  8. Anything else such as maintaining a manual htaccess or something will require manual intervention which will not scale as your site becomes more popular
  9. Anything automatic such as using AI to determine patterns in how referrer spam domains appear will have a hit/miss thing

How do these bots work?

First, it is crucial to understand how these bots work

  1. They use regex patterns at the least such as /UA-\d{6}/ to load tracking ids which they visit recursively after starting at a seed website

I believe I have a solution that offers the following advantages

  1. No need to maintain whitelists and blacklist
  2. Will work against 99% of them easily and can always be modified to take it to 100%
  3. Requires almost NO manual intervention
  4. The idea is to NOT have a tracking ID at all in the script

Here is an example

script.
      //- Google Analytics ID
      var a = [85, 65, 45, 49, 49, 49, 49, 49, 49, 49, 49, 49, 45, 50];

      var newScript = document.createElement("script");
      newScript.type = "text/javascript";
      newScript.setAttribute("async", "true");
      newScript.setAttribute("src", "https://www.googletagmanager.com/gtag/js?id=" + a.map(i => String.fromCharCode(i)).join(""));
      document.documentElement.firstChild.appendChild(newScript);

      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());
      gtag('config', a.map(i => String.fromCharCode(i)).join(""), { 'send_page_view': false });
      // Feature detects Navigation Timing API support.
      if (window.performance) {
        // Gets the number of milliseconds since page load
        // (and rounds the result since the value must be an integer).
        var timeSincePageLoad = Math.round(performance.now());
        console.log(timeSincePageLoad)
        // Sends the timing event to Google Analytics.
        gtag('event', 'timing_complete', {
          'name': 'load',
          'value': timeSincePageLoad,
          'event_category': '#{title}'
        });
      }
  1. We take a very simple approach, break the tracking ID of the form 'UA-1111111-1' into a char code array

  2. Now we construct the tracking ID dynamically from the char code array at any point we need a reference to the tracking ID

  3. The approach can be made infinitely more complex by turning it into encrypted bunch of numbers, base 8 , hexadecimal, adding a fixed offset, a random offset during each run, RSA encrypting the tracking ID with a private key on the server and decrypting it with a public key but the basic approach is REALLY fast, as arrays in JS are really fast, can easily beat 99% of the bots

PirateApp
  • 5,433
  • 4
  • 57
  • 90