0

Is it possible to tell Google not to crawl these pages

/blog/page/10
/blog/page/20
…
/blog/page/100

These are basically Ajax calls that bring blog posts data.

I created this in robots.txt:

User-agent: *
Disallow: /blog/page/*

But now I have to another page that I want allow which is

/blog/page/start

Is there a way that I tell robots that only pages that end with a number e.g

User-agent: *
Disallow: /blog/page/(:num)

I also got an error bellow when I tried to validate the robots.txt file:

enter image description here

unor
  • 92,415
  • 26
  • 211
  • 360
Sohail
  • 2,058
  • 7
  • 33
  • 57

1 Answers1

1

Following the original robots.txt specification, this would work (for all conforming bots, including Google’s):

User-agent: *
Disallow: /blog/pages/0
Disallow: /blog/pages/1
Disallow: /blog/pages/2
Disallow: /blog/pages/3
Disallow: /blog/pages/4
Disallow: /blog/pages/5
Disallow: /blog/pages/6
Disallow: /blog/pages/7
Disallow: /blog/pages/8
Disallow: /blog/pages/9

This blocks all URLs whose path begins with /blog/pages/ followed by any number (/blog/pages/9129831823, /blog/pages/9.html, /blog/pages/5/10/foo etc.).
So you should not append the * character (it’s not a wildcard in the original robots.txt specification, and not even needed in your case for bots that interpret it as wildcard).

Google supports some features for robots.txt which are not part of the original robots.txt specification, and therefore are not supported by (all) other bots, e.g., the Allow field. But as the above robots.txt would work, there is no need for using it.

Community
  • 1
  • 1
unor
  • 92,415
  • 26
  • 211
  • 360