99

I would like Google to ignore URLs like this:

http://www.mydomain.example/new-printers?dir=asc&order=price&p=3

In other words, all the URLs that have the parameters dir, order and price should be ignored. How do I do so with robots.txt?

Stephen Ostermiller
  • 23,933
  • 14
  • 88
  • 109
Luis Valencia
  • 32,619
  • 93
  • 286
  • 506

3 Answers3

167

Here's a solutions if you want to disallow query strings:

Disallow: /*?*

or if you want to be more precise on your query string:

Disallow: /*?dir=*&order=*&p=*

You can also add to the robots.txt which url to allow

Allow: /new-printer$

The $ will make sure only the /new-printer will be allowed.

More info:

http://code.google.com/web/controlcrawlindex/docs/robots_txt.html

http://sanzon.wordpress.com/2008/04/29/advanced-usage-of-robotstxt-w-querystrings/

Book Of Zeus
  • 49,509
  • 18
  • 174
  • 171
  • this will disallow new-printers I only want to disorder the querystring part – Luis Valencia Feb 05 '12 at 15:02
  • so you want to allow `/new-printer` but not `/new-printers?dir=*&order=*&p=*?`? – Book Of Zeus Feb 05 '12 at 15:05
  • 1
    Are those advanced wildcards and the allow directive supported well? – Tony McCreath Jan 15 '13 at 14:34
  • 11
    According to http://www.robotstxt.org/robotstxt.html - "there is no "Allow" field" – Jamie Edwards Apr 22 '13 at 09:38
  • Taking the new-printers example a bit further, what if different combinations and orders of parameters are possible on that file. Can you specify in a single query that a specific file should be disallowed if any kind of parameters are added to it without explicitly specifying them? Would... Disallow: /new-printer?* work? – AdamJones Aug 27 '14 at 21:20
  • @AdamJones the last command should work. It will follow the same logic as the first condition. I never tried it so I can't guarantee it will work. – Book Of Zeus Aug 27 '14 at 22:25
  • @JamieEdwards it's true that "Allow" is technically speaking not part of the standard, but most of the popular search engines do support it. Allow lines should be *before* Disallow lines though. – Andy Madge Oct 06 '14 at 16:36
  • @BookOfZeus Will the page will be crawled or not? If we add the said condition in `robots` – Pranav Bilurkar Aug 02 '17 at 08:11
  • There is now (as of 2019) a proposed standard undergoing ratification, and it does include Allow lines - https://datatracker.ietf.org/doc/html/draft-koster-rep - perhaps surprisingly, it appears there was no formal "standard" previous to this, and search engines were left to their own devices to operate "by convention" with a "de facto" standard that led to spotty support for Allow lines except for the big ones (eg Google and Bing). – Matt Wagner Jun 04 '21 at 13:56
38

You can block those specific query string parameters with the following lines

Disallow: /*?*dir=
Disallow: /*?*order=
Disallow: /*?*p=

So if any URL contains dir=, order=, or p= anywhere in the query string, it will be blocked.

Nick Rolando
  • 25,879
  • 13
  • 79
  • 119
0

Register your website with Google WebMaster Tools. There you can tell Google how to deal with your parameters.

Site Configuration -> URL Parameters

You should have the pages that contain those parameters indicate that they should be excluded from indexing via the robots meta tag. e.g.

Tony McCreath
  • 2,882
  • 1
  • 14
  • 21
  • 3
    While the original question mentions Google specifically, it's important to note that the Google WebMaster Tools would only block Google. Adding the Disallow rules in the robots.txt file would address other search engines as well. – Matt V. Jan 14 '13 at 20:37
  • 1
    True. It should also be clarified that robots.txt does not stop Google indexing pages but stops it reading their content. The best solution is using the robots meta tag on the page itself. This is supported by all systems. – Tony McCreath Jan 15 '13 at 14:35
  • 4
    Note that this doesn't work anymore since they removed that functionality, see https://developers.google.com/search/blog/2022/03/url-parameters-tool-deprecated – Joël Aug 04 '22 at 08:15