Block bingbot from crawling my site

Question

I would like t completely block bing from crawling my site for now (its attacking my site at an alarming rate (500GB of data a month).

I have 1000 sub domains added to bing webmaster tools so i cant go and set each one's crawl rate. I have tried blocking it using robots.txt but its not working here is my robots.txt

# robots.txt 
User-agent: *
Disallow:
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
Disallow: bingbot
User-agent: ia_archiver
Disallow: /

I also found bingbot doing this on many.. many websites I manage. Completely ignores general "*" rules and any Crawl-delays. — WooDzu, Mar 10 '17 at 08:53

Carl · Accepted Answer · 2014-11-28T17:30:16.223

5

This WILL definitely affect your SEO/search ranking and will cause pages to drop from the index so please use with care

You can block requests based on the user-agent string if you have the iis rewrite module installed (if not go here)

And then add a rule to your webconfig like this:

<system.webServer>
  <rules>
    <rule name="Request Blocking Rule" stopProcessing="true">
      <match url=".*" />
      <conditions>
        <add input="{HTTP_USER_AGENT}" pattern="msnbot|BingBot" />
      </conditions>
      <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You do not have permission to view this page." />
    </rule>
  </rules>
</system.webServer>

This will return a 403 if the bot hits your site.

UPDATE

Looking at your robots.txt i think it should be:

# robots.txt 
User-agent: *
Disallow:
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member
User-agent: bingbot
Disallow: /
User-agent: ia_archiver
Disallow: /

edited Nov 28 '14 at 17:30

answered Nov 28 '14 at 16:54

Carl

2,285
1
16
31

Thank you that seem to work.. at least from bing webmaster toolbox verification. – Zoinky Nov 28 '14 at 17:14
No problem - I've also added to my answer what I think your robots file should be (the first disallow should be "/" rather than blank.) Bots do take time to pick up changes in robots.txt files though, even if you submit them via webmaster tools. – Carl Nov 28 '14 at 17:19
The updated robot will ban all crawlers i think. right now I am trying to stop just bing from crawling until I figure out why its attacking it so much. – Zoinky Nov 28 '14 at 17:25
The web.config solution works in Framework 2.0 deploy in IIS 6 Windows 2003 Server? – Hernaldo Gonzalez Mar 21 '17 at 14:33

score 4 · Answer 2 · answered Nov 29 '14 at 19:00

Your robots.txt is not correct:

You need line breaks between records (a record starts with one or more User-agent lines).
Disallow: bingbot disallows crawling of URLs whose paths start with "bingbot" (i.e., http://example.com/bingbot), which is probably not what you want.
Not an error, but Disallow: is not needed (as it’s the default anyway).

So you probably want to use:

User-agent: *
Disallow: *.axd
Disallow: /cgi-bin/
Disallow: /member

User-agent: bingbot
User-agent: ia_archiver
Disallow: /

This disallows crawling of anything for "bingbot" and "ia_archiver". All other bots are allowed to crawl everything except URLs whose paths start with /member, /cgi-bin/, or *.axd.

Note that *.axd will be interpreted literally by bots following the original robots.txt specification (so they will not crawl http://example.com/*.axd, but they will crawl http://example.com/foo.axd). However, many bots extend the spec and interpret the * as some kind of wildcard.

Block bingbot from crawling my site

2 Answers2