1

I'm trying to filter some bots by blocking them in ".htaccess" file like this:

#UniversalRules
SetEnvIfNoCase User-Agent ^$ bad_bot #leave this for blank user-agents
SetEnvIfNoCase User-Agent .*\@.* bad_bot
SetEnvIfNoCase User-Agent .*bot.* bad_bot

But these rules also block good bots, so I added below

#Goodbots
SetEnvIfNoCase User-Agent .*google.* good_bot
SetEnvIfNoCase User-Agent .*bingbot.* good_bot #bing

And finally the blocking rule

Order Allow,Deny
Allow from all
Deny from env=bad_bot

But when I'm using GoogleBot useragent (Googlebot/2.1 (+http://www.googlebot.com/bot.html) I'm getting - 403 forbidden.

What's wrong ?

xav
  • 5,452
  • 7
  • 48
  • 57
Alexey Shatrov
  • 450
  • 4
  • 12

1 Answers1

1

GoogleBot sets both environment variables; setting a variable (good_bot) does not unset other variables (bad_bot). You can set one variable and unset it afterwards:

#UniversalRules
SetEnvIfNoCase User-Agent ^$           bad_bot
SetEnvIfNoCase User-Agent .*\@.*       bad_bot
SetEnvIfNoCase User-Agent .*bot.*      bad_bot
#Goodbots
SetEnvIfNoCase User-Agent .*google.*  !bad_bot
SetEnvIfNoCase User-Agent .*bingbot.* !bad_bot

See mod_setenvif reference for examples. BrowserMatchNoCase provides identical functionality with shorter syntax. And you can remove all .* in your regex.

Salman A
  • 262,204
  • 82
  • 430
  • 521
  • Thank you! I'll try this tomorrow. But I'm also trying to set variable as `SetEnvIfNoCase User-Agent .*bot.* visitor=bad_bot`and check it later as `Deny from visitor=bad_bot` - but this don't work too. – Alexey Shatrov Jan 06 '18 at 20:38
  • `Deny from env=bad_bot` should work. In Allow or Deny, only the presence (or absence) of env variable is checked, not the value of the variable. – Salman A Jan 06 '18 at 20:57