0

Our URL is:

http://example.com/kitchen-knife/collection/maitre-universal-cutting-boards-rana-parsley-chopper-cheese-slicer-vegetables-knife-sharpening-stone-ham-stand-ham-stand-riviera-niza-knives-block-benin.html

I want to disallow URLs to be crawled after collection, but before collection there are categories that are dynamically coming.

How would I disallow URLs in robots.txt after /collection?

unor
  • 92,415
  • 26
  • 211
  • 360
bhargav
  • 1
  • 1
  • 3

1 Answers1

1

This is not possible in the original robots.txt specification.

But some (!) parsers extend the specification and define a wildcard character (typically *).

For those parsers, you could use:

Disallow: /*/collection

Parsers that understand * as wildcard will stop crawling any URL whose path starts with anything (which may be nothing), followed by /collection/, followed by anything, e.g.,

http://example.com/foo/collection/
http://example.com/foo/collection/bar
http://example.com/collection/

Parsers that don’t understand * as wildcard (i.e., they follow the original specification) will stop crawling any URL whose paths starts with /*/collection/, e.g.

http://example.com/*/collection/
http://example.com/*/collection/bar
unor
  • 92,415
  • 26
  • 211
  • 360