robots.txt: Disallow bots to access a given "url depth"

Question

I have links with this structure:

http://www.example.com/tags/blah
http://www.example.com/tags/blubb
http://www.example.com/tags/blah/blubb (for all items that match BOTH tags)

I want google & co to spider all links that have ONE tag in the URL, but NOT the URLs that have two or more tags.

Currently I use the html meta tag "robots" -> "noindex, nofollow" to solve the problem.

Is there a robots.txt solution (that works at least for some search bots) or do I need to continue with "noindex, nofollow" and live with the additional traffic?

score 1 · Accepted Answer · answered Mar 25 '09 at 18:37

1

I don't think you can do it using robots.txt. The standard is pretty narrow (no wildcards, must be at the top level, etc.).

What about disallowing them based on user-agent in your server?

answered Mar 25 '09 at 18:37

MarkusQ

21,814
3
56
68

Disallowing access would result in some kind of HTTP error. Not sure how google reacts to pages with lots of "server error". Not very enthusiastic to try out :) – BlaM Mar 25 '09 at 18:39
Wouldn't have to--you could serve up some cheap static "nothing to see here" page. – MarkusQ Mar 25 '09 at 18:44
That's true. Would at least be better than to serve the full page. – BlaM Mar 25 '09 at 18:47

robots.txt: Disallow bots to access a given "url depth"

1 Answers1