2

this is my robots.txt. I want to only allow the base url domain.com for indexing and disallow all sub urls like domain.com/foo and domain.com/bar.html.

User-agent: *
Disallow: /*/

Because I am not sure whether this is a valid syntax I tested it using Google Webmaster Tools. It shows me this message.

robots.txt file is probably invalid.

Is my file valid? Is there a better way of only allowing the base url for indexing?

Update: Google downloaded my robots.txt 4 hours ago. I think thats why it doesn't work. I will wait some time and if the problem stays I will update my question again.

danijar
  • 32,406
  • 45
  • 166
  • 297
  • I read this: http://stackoverflow.com/questions/5206602/robots-txt-how-to-allow-access-only-to-domain-root-and-no-deeper but did not understand the answer. – danijar Apr 26 '12 at 19:55
  • 1
    Here's another similar question that might help: http://stackoverflow.com/q/43427/669611 – magzalez Apr 26 '12 at 20:40

1 Answers1

0

Here is a link to a validator. It might help you work through any errors in the file.

Robots.txt Checker

I checked on another validator, robots.txt Checker, and this is what I got for the second line:

Wildcard characters (like "*") are not allowed here The line below must be an allow, disallow, comment or a blank line statement

This might be what you're looking for:

User-Agent: *
Allow: /index.html
Disallow: /

This assumes your homepage is index.html.

If index.php is your homepage, you should be able to swap out index.html for index.php.

User-Agent: *
Allow: /index.php
Disallow: /

On my dynamic websites that run through index.php, going to mydomain.com/index.php still takes me to the homepage, so the above should work.

magzalez
  • 1,396
  • 2
  • 14
  • 25