Disallow dynamic URL in robots.txt

Question

Our URL is:

http://example.com/kitchen-knife/collection/maitre-universal-cutting-boards-rana-parsley-chopper-cheese-slicer-vegetables-knife-sharpening-stone-ham-stand-ham-stand-riviera-niza-knives-block-benin.html

I want to disallow URLs to be crawled after collection, but before collection there are categories that are dynamically coming.

How would I disallow URLs in robots.txt after /collection?

Disallow: /foloder_name/. – user1844933 May 23 '15 at 07:46 — user1844933, May 23 '15 at 07:46

score 1 · Answer 1 · answered May 23 '15 at 12:51

This is not possible in the original robots.txt specification.

But some (!) parsers extend the specification and define a wildcard character (typically *).

For those parsers, you could use:

Disallow: /*/collection

Parsers that understand * as wildcard will stop crawling any URL whose path starts with anything (which may be nothing), followed by /collection/, followed by anything, e.g.,

http://example.com/foo/collection/
http://example.com/foo/collection/bar
http://example.com/collection/

Parsers that don’t understand * as wildcard (i.e., they follow the original specification) will stop crawling any URL whose paths starts with /*/collection/, e.g.

http://example.com/*/collection/
http://example.com/*/collection/bar

Disallow dynamic URL in robots.txt

1 Answers1

Linked