Disallow pages that ends with number only in robots.txt

Question

Is it possible to tell Google not to crawl these pages

/blog/page/10
/blog/page/20
…
/blog/page/100

These are basically Ajax calls that bring blog posts data.

I created this in robots.txt:

User-agent: *
Disallow: /blog/page/*

But now I have to another page that I want allow which is

/blog/page/start

Is there a way that I tell robots that only pages that end with a number e.g

User-agent: *
Disallow: /blog/page/(:num)

I also got an error bellow when I tried to validate the robots.txt file:

score 1 · Accepted Answer · edited May 23 '17 at 10:26

Following the original robots.txt specification, this would work (for all conforming bots, including Google’s):

User-agent: *
Disallow: /blog/pages/0
Disallow: /blog/pages/1
Disallow: /blog/pages/2
Disallow: /blog/pages/3
Disallow: /blog/pages/4
Disallow: /blog/pages/5
Disallow: /blog/pages/6
Disallow: /blog/pages/7
Disallow: /blog/pages/8
Disallow: /blog/pages/9

This blocks all URLs whose path begins with /blog/pages/ followed by any number (/blog/pages/9129831823, /blog/pages/9.html, /blog/pages/5/10/foo etc.).
So you should not append the * character (it’s not a wildcard in the original robots.txt specification, and not even needed in your case for bots that interpret it as wildcard).

Google supports some features for robots.txt which are not part of the original robots.txt specification, and therefore are not supported by (all) other bots, e.g., the Allow field. But as the above robots.txt would work, there is no need for using it.

Disallow pages that ends with number only in robots.txt

1 Answers1