I recently changed my robots.txt to disallow bots from making expensive search API queries. They are allowed all other pages right now except the /q?...
that is a search API query and expensive.
User-agent: *
Disallow: /q?
Sitemap: /sitemap.xml.gz
Now I'm still getting bots in my logs. Is it google or just "googlebot compatible"? How can I disallow bots completely from /q?
2014-10-18 21:04:23.474 /q?query=category%3D5030%20and%20cityID%3D4698187&o=4 200 261ms 7kb Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) module=default version=disallow
66.249.79.28 - - [18/Oct/2014:12:04:23 -0700] "GET /q?query=category%3D5030%20and%20cityID%3D4698187&o=4 HTTP/1.1" 200 8005 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ms=261 cpu_ms=108 cpm_usd=0.050895 app_engine_release=1.9.13 instance=00c61b117cdfd20321977d865dd08cef54e2fa
Can I blacklist specific bots according to their http headers in my request handler or in my dos.yaml
if robots.txt can't do it? When I run this to look for matches, there are 50 matches last 2 hours:
path:/q.* useragent:.*Googlebot.*
The log rows look like this, looking like googlebot:
2014-10-19 00:37:34.449 /q?query=category%3D1030%20and%20cityID%3D4752198&o=18 200 138ms 7kb Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) module=default version=disallow
66.249.79.102 - - [18/Oct/2014:15:37:34 -0700] "GET /q?query=category%3D1030%20and%20cityID%3D4752198&o=18 HTTP/1.1" 200 7965 - "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "www.classifiedsmarket.appspot.com" ms=138 cpu_ms=64 cpm_usd=0.050890 app_engine_release=1.9.13 instance=00c61b117c781458f46764c359368330c7d7fdc4