0

I'm looking to NOINDEX all my tag pages i.e.

http://example.com/tags/tabs
http://example.com/tags/people

etc.

If I add the following to my robots.txt page (see: http://jsfiddle.net/psac2uzy/)

Disallow: /tags/
Disallow: /tags/*

will this stop Google from indexing all my tag pages?

Even though those paths aren't the same as the Drupal structure (since Drupal keeps content in the database)?

unor
  • 92,415
  • 26
  • 211
  • 360
Sam
  • 111
  • 10

2 Answers2

0

Note: You can’t disallow indexing with robots.txt, you can only disallow crawling (related answer).

What matters are the actual URLs which your users, among them search engines, see. They don’t have access to your backend, so they don’t even know how your site works interally.

The line Disallow: /tags/ (no need for the other one with *) means that all URLs whose paths start with /tags/ should not be crawled. So, assuming that the robots.txt is at http://example.com/robots.txt, this would block for example:

  • http://example.com/tags/
  • http://example.com/tags/foo
  • http://example.com/tags/foo/bar

If your tags are available under a different URL (for example, Drupal’s default /taxonomy/term/…), and a bot finds these alternative URLs, it may of course crawl them. So it’s generally a good idea to always redirect to the one canonical URL you want to use.

Community
  • 1
  • 1
unor
  • 92,415
  • 26
  • 211
  • 360
-1

Add before:

User-Agent: *
Crawl-Delay: 10
Disallow: /tags

(Maybe you can try out not clean URLs too: Disallow: /?q=tags )

Check this page for more information.

Hope that helps

balintpekker
  • 1,804
  • 4
  • 28
  • 42
  • Thanks I'm still a bit confused with the wild card, would the following stop all tag pages from being indexed: /*tags ? – Sam Nov 28 '14 at 10:43