3

I was searching around on how to no index specific URLs but I havent found any specific info on the following.

By adding the below

<?php if(is_single(X)): ?>
<meta name="robots" content="noindex,nofollow">
<?php endif; ?>

I would be able to no index the (X) where X could be the post ID, the post title of “Hello World” for example , or a post slug of “hello-world”.

Would if be possible to specify all URLs which start with the same post slug or title for example, as in the example below?

www.test.com/REF-123-mytest.html
www.test.com/REF-123-yourtest.html
www.test.com/REF-123-histest.html

Could I leave out all URLs which start by REF-123 for example?

unor
  • 92,415
  • 26
  • 211
  • 360
jiko
  • 39
  • 1
  • 2
  • Are you aware of the difference between crawling and indexing? Robots.txt could help disallowing crawling of your pages, not indexing. – unor Nov 13 '14 at 09:22

2 Answers2

1

By using robots.txt, you can disallow crawling.

By using meta-robots (or the HTTP header X-Robots-Tag), you can disallow indexing.

If you intend to forbid indexing, you shouldn’t disallow the URLs in robots.txt, otherwise bots will never know that you don’t want these URLs to be indexed.

In case you want to disallow crawling, you could use this robots.txt:

User-agent: *
Disallow: /REF-123

This would apply to all URLs whose paths start with REF-123 (case-sensitive!).

In case you want to disallow indexing, you could add to all these pages this meta element

<meta name="robots" content="noindex">

or send the corresponding HTTP header X-Robots-Tag:

X-Robots-Tag: noindex
Community
  • 1
  • 1
unor
  • 92,415
  • 26
  • 211
  • 360
  • Thank you Unor. I am exclusively trying to no index the URLs but I would like to avoid adding the "noindex" tag one by one. Is there any work around? – jiko Nov 14 '14 at 14:15
  • @jiko: No, you have to do this per page/URL, either via HTML (with the `meta` element) or via HTTP (with the HTTP header). Of course you can use a server-side programming language like PHP to include this `meta`/header on certain pages only. – unor Nov 14 '14 at 22:21
  • if i want to disallow www.example.com/test.html i can write Disallow: /test.html ? also i f i want to disallow www.example.com/category/pen then i can write Disallow: /category/pen ?? Please explain – Abilash Erikson Jul 04 '18 at 06:11
  • @abilasher: Yes. – unor Jul 04 '18 at 10:58
0

You can add this rule in a robots.txt file:

Disallow: www.test.com/REF-123*
Magicprog.fr
  • 4,072
  • 4
  • 26
  • 35
  • This won’t work. 1.) It would block `http://example.com/www.test.com/REF-123*`, not `http://www.test.com/REF-123*`. 2.) Furthermore, the `*` has no special meaning in the original robots.txt specification, so it will be interpreted literally, i.e., only URLs whose paths contain a `*` character at that position will be blocked. 3.) With robots.txt you can disallow crawling, not indexing (however, it’s not yet clear what the OP is really after). – unor Nov 13 '14 at 14:54