3

We have recently been seeing a large number of 404 errors that are being created from the Bing web crawler. I have verified that the IP is in fact a Bing machine but just don't know why they are attempting the URL's they are trying. I don't want to use a robots.txt file to just tell them not to crawl my site at all but at the same time I don't want them to continue requesting pages that don't exist. Is there any way to tell where Bing is getting a specific URL from? I tried searching Google using [link:www.mywebsite.com/pagename/] and nothing is found which leads me to believe the bot is doing something it isn't supposed to rather than my site having a bad URL.

Jason
  • 17,276
  • 23
  • 73
  • 114
  • 1
    What URLs are they requesting? You don't have to say specifically, but please describe if there is anything "meaningful" in them: e.g. is it possible that it was a valid URL at some point? – Kiril Dec 03 '12 at 17:22
  • The site is driven from database entries so there are links such as www.mysite.com/item/57 but the URL's they are hitting are just www.mystie.com/57. I've tried to find any place where the ID's are located but haven't had any luck. – Jason Dec 03 '12 at 20:03
  • Also, those were never valid URLs. Thanks! – Jason Dec 03 '12 at 20:06
  • Worst case scenario: it could be an obscure Microsoft bug which your website is triggering somehow. If other crawlers are not picking up those links, then it seems that this is not a faulty back-link issue. – Kiril Dec 04 '12 at 18:20
  • I can confirm that this is happening on the site I work and only bingbot does this. We have similar URL pattern, like `baseurl/id/name-1`, `baseurl/id/name-2`, etc., but they keep hitting us with invalid requests like `baseurl/id/1`, `baseurl/id/2` (which were never valid) - the solution we have for now is to permanently redirect these requests to `baseurl/id`, which is a valid URL in our case, rather than throwing a 404. – arun Jan 21 '14 at 07:14

0 Answers0