1

I have been going through different forums and was wondering if this is correct. I am trying to disable bots from crawling queries only in specific subpages (e.g. www.website.com/subpage/?query=sample). I am trying to make sure /subpage/ does not get disallowed also. Please correct me if I am wrong.

File: robots.txt

User-agent: *
Disallow: /subpage/*?
Jonas
  • 121,568
  • 97
  • 310
  • 388
Elmer
  • 259
  • 1
  • 3
  • 11
  • You can always download the appropriate [add-on](https://addons.mozilla.org/en-US/firefox/addon/user-agent-switcher/) or [extension](https://chrome.google.com/webstore/detail/user-agent-switcher-for-c/djflhoibgkdhkhhcedjiklpkjnoahfmg?hl=en-US) to test these things from the comfort of your own browser. –  Oct 21 '13 at 05:34
  • @MisterMelancholy Thanks for the comment :) However, I was just wondering if the line 'Disallow:...' is valid or not. Basically I simply don't want bots to crawl queries under 'subpage'. – Elmer Oct 21 '13 at 05:36

2 Answers2

1

According to what I see here, you are very close

User-agent: *
Disallow: /subpage/*?*
Allow: /subpage$

You can test this from the comfort of your own browser by using the appropriate add-on or extension.

Community
  • 1
  • 1
  • Hmm.. I do not want to disallow /subpage/ itself. Because I have /subpage/ which has a box of options that when an option is chosen, it will make the url: /subpage/?option=1. And I do not want google to crawl the query options. – Elmer Oct 21 '13 at 05:47
  • Just a note beside syntax, if a page in /subpage/ is linked from anywhere it will be crawled and indexed. Will appear in SERP (with a notice rather than a content summary taken from the page). As an example [yoast ceased to disallow any supages but one](https://yoast.com/wordpress-robots-txt-example/) – tuk0z May 26 '15 at 20:46
0

I do not think you can specify query string in the Disallow. The value you set for Disallow is referenced as Directory in the documentation (not as URI or URL).

You can however achieve your objective by using Sitemap.xml. You can exclude the URL from sitemap that you do not want indexed.

Google Webmaster tools also gives a some amount of granular control over how query string parameters should be interpreted. Not sure if that serves your purpose

Tippa Raj
  • 584
  • 4
  • 8