0

I have had the following robots.txt for over a year, seemingly without issues:

User-Agent: *

User-Agent: iisbot
Disallow: /



Sitemap: http://iprobesolutions.com/sitemap.xml

Now I'm getting the following error from the robots.txt Tester enter image description here

Why is googlebot blocking all my urls if the only disallow I selected was for iisbot ?

Julie S.
  • 127
  • 3
  • 18
  • Per https://stackoverflow.com/questions/20294485/is-it-possible-to-list-multiple-user-agents-in-one-line it looks like because you have `User-Agent: *` also it's reading it as `User-Agent: * iisbot` – WOUNDEDStevenJones Aug 02 '17 at 16:13

2 Answers2

3

Consecutive User-Agent lines are added together. So the Disallow will apply to User-Agent: * as well as User-Agent: iisbot.

Sitemap: http://iprobesolutions.com/sitemap.xml

User-Agent: iisbot
Disallow: /

You actually don't need the User-Agent: *.

Björn Tantau
  • 1,564
  • 14
  • 13
  • sorry the actual code does have empty lines in between and I've updated it in the question. If this still is incorrect, could you please type out the code for me ? – Julie S. Aug 02 '17 at 16:35
  • Actually now after I used your code google crawl reports the following issue:"Sitemap contains urls which are blocked by robots.txt." see screenshot: https://www.dropbox.com/s/uk5xsbuk7yqo6za/Screenshot%202017-08-02%2016.08.13.png?dl=0 Any idea what issue is ? – Julie S. Aug 02 '17 at 20:07
2

Your robots.txt is not valid (according to the original robots.txt specification).

  • You can have multiple records.
  • Records are separated by empty lines.
  • Each record must have at least one User-agent line and at least one Disallow line.

The spec doesn’t define how invalid records should be treated. So user-agents might either interpret your robots.txt as having one record (ignoring the empty line), or they might interpret the first record as allowing everything (at least that would be the likely assumption).

If you want to allow all bots (except "iisbot") to crawl everything, you should use:

User-Agent: *
Disallow: 

User-Agent: iisbot
Disallow: /

Alternatively, you could omit the first record, as allowing everything is the default anyway. But I’d prefer to be explicit here.

unor
  • 92,415
  • 26
  • 211
  • 360
  • Thanks but i still get the sitemap errors per https://www.dropbox.com/s/ezdw64korncw2r9/Screenshot%202017-08-03%2010.15.57.png?dl=0 if I use the code like you said followed by the sitemap per: https://www.dropbox.com/s/mfd3ozz9343tnjg/Screenshot%202017-08-03%2010.15.01.png?dl=0 – Julie S. Aug 03 '17 at 14:13
  • actually even if I use your exact code I still get the error. – Julie S. Aug 03 '17 at 14:28
  • @JulieS.: I would say the sitemap warnings reported in Google’s Search Console are not directly related to the issue with your robots.txt. My guess is that it’s a caching issue: Google needs some time until it updates their cache of the robots.txt, and then there shouldn’t be any blocked URLs listed in the sitemap anymore (because with your new robots.txt, no URL is blocked for them anymore). – unor Aug 03 '17 at 20:02