9

I would like t completely block bing from crawling my site for now (its attacking my site at an alarming rate (500GB of data a month).

I have 1000 sub domains added to bing webmaster tools so i cant go and set each one's crawl rate. I have tried blocking it using robots.txt but its not working here is my robots.txt

# robots.txt 
User-agent: *
Disallow:
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
Disallow: bingbot
User-agent: ia_archiver
Disallow: /
Cyril
  • 3,048
  • 4
  • 26
  • 29
Zoinky
  • 4,083
  • 11
  • 40
  • 78
  • I also found bingbot doing this on many.. many websites I manage. Completely ignores general "*" rules and any Crawl-delays. – WooDzu Mar 10 '17 at 08:53

2 Answers2

5

This WILL definitely affect your SEO/search ranking and will cause pages to drop from the index so please use with care

You can block requests based on the user-agent string if you have the iis rewrite module installed (if not go here)

And then add a rule to your webconfig like this:

<system.webServer>
  <rules>
    <rule name="Request Blocking Rule" stopProcessing="true">
      <match url=".*" />
      <conditions>
        <add input="{HTTP_USER_AGENT}" pattern="msnbot|BingBot" />
      </conditions>
      <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You do not have permission to view this page." />
    </rule>
  </rules>
</system.webServer>

This will return a 403 if the bot hits your site.

UPDATE

Looking at your robots.txt i think it should be:

# robots.txt 
User-agent: *
Disallow:
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
User-agent: bingbot
Disallow: /
User-agent: ia_archiver
Disallow: /
Carl
  • 2,285
  • 1
  • 16
  • 31
  • Thank you that seem to work.. at least from bing webmaster toolbox verification. – Zoinky Nov 28 '14 at 17:14
  • No problem - I've also added to my answer what I think your robots file should be (the first disallow should be "/" rather than blank.) Bots do take time to pick up changes in robots.txt files though, even if you submit them via webmaster tools. – Carl Nov 28 '14 at 17:19
  • The updated robot will ban all crawlers i think. right now I am trying to stop just bing from crawling until I figure out why its attacking it so much. – Zoinky Nov 28 '14 at 17:25
  • The web.config solution works in Framework 2.0 deploy in IIS 6 Windows 2003 Server? – Hernaldo Gonzalez Mar 21 '17 at 14:33
4

Your robots.txt is not correct:

  • You need line breaks between records (a record starts with one or more User-agent lines).

  • Disallow: bingbot disallows crawling of URLs whose paths start with "bingbot" (i.e., http://example.com/bingbot), which is probably not what you want.

  • Not an error, but Disallow: is not needed (as it’s the default anyway).

So you probably want to use:

User-agent: *
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member

User-agent: bingbot
User-agent: ia_archiver
Disallow: /

This disallows crawling of anything for "bingbot" and "ia_archiver". All other bots are allowed to crawl everything except URLs whose paths start with /member, /cgi-bin/, or *.axd.

Note that *.axd will be interpreted literally by bots following the original robots.txt specification (so they will not crawl http://example.com/*.axd, but they will crawl http://example.com/foo.axd). However, many bots extend the spec and interpret the * as some kind of wildcard.

unor
  • 92,415
  • 26
  • 211
  • 360