Need to stop indexing the URL parameters for custom build CMS

Question

I would like for Google to ignore URLs like this:

https://www.example.com/blog/category/web-development?page=2

As my links are getting indexed in Google I need to stop indexing them. What code should I use to not index them?

This is my curet robots.txt file:

Disallow: /cgi-bin/
Disallow: /scripts/
Disallow: /privacy
Disallow: /404.html
Disallow: /500.html
Disallow: /tweets
Disallow: /tweet/

Can I use this to disallow them?

Disallow: /blog/category/*?*

@Machavity: I don’t think that this questions ask for SEO advice. It’s plain specification-based question (to answer the question, only the robots.txt spec + Google’s extension of it are relevant). — unor, Jul 11 '18 at 15:50
@Machavity it's rare that I disagree with you, but... what unor said. — Paul Roub, Jul 11 '18 at 15:55

score 0 · Answer 1 · answered Jul 11 '18 at 18:33

With robots.txt, you can prevent crawling, not necessarily indexing.

If you want to disallow Google to crawl URLs

whose paths start with /blog/category/, and
that contain a query component (e.g., ?, ?page, ?page=2, ?foo=bar&page=2 etc.)

then you can use this:

Disallow: /blog/category/*?

You don’t need another * at the end because Disallow values represent the start of the URL (beginning from the path).

But note that this is not supported by all bots. According to the original robots.txt spec, the * has no special meaning. Conforming bots would interpret the above line literally (* as part of the path). If you were to follow only the rules from the original specification, you would have to list every occurrence:

Disallow: /blog/category/c1?
Disallow: /blog/category/c2?
Disallow: /blog/category/c3?

Need to stop indexing the URL parameters for custom build CMS

1 Answers1