0

Referrer spam is a huge problem in my analytics right now and I have been combating it for months.

I'm aware of the botnet discussions surrounding semalt.com (and other referral spammers). I'm also aware that a some of the referral spam is likely triggered without ever visiting my site (which is why my .htaccess directives aren't catching all of it) and I have added filters to my analytics/tag manager accordingly.

I've researched extensively, including: How to Block Spam Referrers like darodar.com from Accessing Website? and Domain name in mod_rewrite RewriteRule

I'm hoping to implement code which for any sites with actual crawlers will send their 'bots back at them. I have over 100 referrers blacklisted in my .htaccess but they all follow the same pattern, this is what I have now:

<IfModule mod_rewrite.c>
  RewriteEngine on
  Options +FollowSymlinks

  RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*semalt\.com.*? [NC]
  RewriteRule ^(.*)$ http://semalt.com/ [L]

  RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)*simple-share-buttons\.com.*? [NC]
  RewriteRule ^(.*)$ http://simple-share-buttons.com/ [L]
</IfModule>

I'd like to simplify that (new domains sending referral spam pop up frequently) so I'm wondering if this would work:

<IfModule mod_rewrite.c>
  RewriteEngine on
  Options +FollowSymlinks

  RewriteCond %{HTTP_REFERER} (semalt\.com) [NC]
  RewriteRule ^(.*)$ %{HTTP_REFERER} [L]

  RewriteCond %{HTTP_REFERER} (simple-share-buttons\.com) [NC]
  RewriteRule ^(.*)$ %{HTTP_REFERER} [L]
</IfModule>

It seems like it should work, which makes me wonder if I can go a step further to this:

<IfModule mod_rewrite.c>
  RewriteEngine on
  Options +FollowSymlinks

  RewriteCond %{HTTP_REFERER} (semalt\.com|simple-share-buttons\.com) [NC]
  RewriteRule ^(.*)$ %{HTTP_REFERER} [L]
</IfModule>

I want to burden my own server as little as possible and I don't care about protocols, subdomains, or paths included.

Basically, if any part of the referrer matches that string, I want to block it and redirect it to itself.

Will the directives I have written work as I expect and are they reasonably efficient in the RegEx matching patterns?

Is there a better way to do this that I am unaware of?

Note: Many of these sites are on a VPS where I can edit the httpd.conf but not all so .htaccess specific answers, which I can adapt, are preferred.

Community
  • 1
  • 1
adam-asdf
  • 646
  • 7
  • 16

1 Answers1

1

Just little fix for the first example you gave, you should escape the slashes // like

 RewriteCond %{HTTP_REFERER} ^https?:\/\/([^.]+\.)*semalt\.com.*? [NC]

But for the rule purpose you only need this

RewriteCond %{HTTP_REFERER} ([^.]+\.)*semalt\.com.*? [NC]

Any of the rules you propose will work fine, but they only will be effective for semalt. simple share buttons is not a crawler so it won't have any effect.

You can demonstrate it by checking your access log, if you look for these 2 referrer spam you will only see records of semalt, none from simple share buttons.

The only way to stop Ghost Spam** is by using filters in GA. You can find more information about this Referrer Spam here https://stackoverflow.com/a/29312117/3197362

And for more general information about Referrer Spam you can check this answer https://stackoverflow.com/a/28354319/3197362

As for the REGEX this is an excelent tool to test them https://regex101.com/

Community
  • 1
  • 1
Carlos Escalera Alonso
  • 2,333
  • 2
  • 25
  • 37
  • I don't actually want to send all the 'bots to semalt.com, I want to send the 'bots back to the referrer. – adam-asdf Apr 21 '15 at 15:16
  • This expression `RewriteCond %{HTTP_REFERER} (semalt\.com|simple-share-buttons\.com) [NC] RewriteRule ^(.*)$ %{HTTP_REFERER} [L]` will work fine to send it back to the referrer. But what I wanted to point out is that simple-share-buttons is not a crawler and it won't have any effect since this kind of referrer (ghost) never reaches your site – Carlos Escalera Alonso Apr 24 '15 at 00:28
  • I'm using both approaches, when I find another spammer I add them to the list and I'm using hostname filtering. I just don't see much point in trying to figure out which are actual referral spam and which are ghost referral spam. Thanks. – adam-asdf May 05 '15 at 20:22
  • Every week there is 1 or 2 new Ghosts while there are just a few Crawlers overall. If you add only the crawlers you will have 3 or 4 lines, however, if you add all the ghosts you will have a huge unnecessary list on your .htacces file that won't have any effect. Although the impact is minimal for these extra lines, it's better to keep the file clean. Since you are applying the filters in GA you will stop the spam, but some people think that using only the htaccess file will work. Just one question by hostname filtering you mean a filter including only your hostnames? – Carlos Escalera Alonso May 05 '15 at 20:52
  • Yes, by setting up an "include" filter that only only shows data from valid hostnames. I think this is the clearest tutorial I found about it: http://www.analyticsedge.com/2015/01/advanced-segment-eliminate-spam-referrals/ – adam-asdf May 06 '15 at 05:37