0

I have been getting a lot of CPU spikes recently on my server and somehow I believe it's not the real traffic or some part of it isn't real. So I want to only allow Google bots, MSN and Yahoo for now. Please guide me if the following robots.txt file is correct for my requirement.

User-agent: Googlebot
User-agent: Slurp
User-agent: msnbot 
User-agent: Mediapartners-Google*
User-agent: Googlebot-Image 
User-agent: Yahoo-MMCrawler
Disallow: 

User-agent: *
Disallow: /

Thanks.

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109

2 Answers2

1

Your robots.txt seems to be valid.

  • It is allowed to have several User-agent lines in a record.
  • Disallow: allows crawling everything.
  • The record starting with User-agent: * only applies to bots not matched by the previous record.
  • Disallow: / forbids crawling anything.

But note: Only nice bots follow the rules in robots.txt -- and it’s likely that nice bots don’t overdo common crawling frequencies. So either you need to work on your performance, or not-so-nice bots are to blame.

Community
  • 1
  • 1
unor
  • 92,415
  • 26
  • 211
  • 360
-1

That first Disallow: should probably be:

Allow: /

if you want to, in fact, allow all those user agents to index your site.

Mark Granoff
  • 16,878
  • 2
  • 59
  • 61
  • 2
    Actually, the standard allows for a blank `Disallow:`. See the examples at http://www.robotstxt.org/robotstxt.html. Also the original specification at http://www.robotstxt.org/orig.html – Jim Mischel Apr 17 '12 at 21:09
  • `Allow` isn’t even part of the original robots.txt specification (but many parsers, including Google’s, support it). – unor Feb 17 '14 at 13:12