219

In robots.txt can I write the following relative URL for the sitemap file?

sitemap: /sitemap.ashx

Or do I have to use the complete (absolute) URL for the sitemap file, like:

sitemap: http://subdomain.domain.com/sitemap.ashx

Why I wonder:

  • I own a new blog service, www.domain.com, that allow users to blog on accountname.domain.com.
  • I use wildcards, so all subdomains (accounts) point to: "blog.domain.com".

In blog.domain.com I put the robots.txt to let search engines find the sitemap. But, due to the wildcards, all user account share the same robots.txt file.Thats why I can't use the second alternative. And for now I can't use url rewrite for txt files. (I guess that later versions of IIS can handle this?)

Robert Massaioli
  • 13,379
  • 7
  • 57
  • 73
Easyrider
  • 3,199
  • 5
  • 22
  • 32

3 Answers3

344

According to the official documentation on sitemaps.org it needs to be a full URL:

You can specify the location of the Sitemap using a robots.txt file. To do this, simply add the following line including the full URL to the sitemap:

Sitemap: http://www.example.com/sitemap.xml
unor
  • 92,415
  • 26
  • 211
  • 360
  • 38
    Please note @unor's example has: Sitemap with capital S. This is important as Robots.txt is case sensitive. – BodgeIT May 26 '16 at 13:55
  • 23
    And on the topic of case, [robotstxt.org](http://www.robotstxt.org/robotstxt.html) specifies the file to be named `robots.txt` without the capital R. – khargoosh Aug 30 '16 at 04:44
  • if the site is loading https, Sitemap URL mentioned with http. Is this fine? Or do we have to place the sitemap URL based on the protocol? – Shams Apr 04 '17 at 06:02
  • 4
    @Shams: The URLs listed in your sitemap have to use the same protocol and the same host as the sitemap file. If your site is available under `http` *and* `https`, you [should only provide one sitemap (with the canonical variant)](http://stackoverflow.com/a/34835383/1591669). – unor Apr 04 '17 at 13:39
7

Google crawlers are not smart enough, they can't crawl relative URLs, that's why it's always recommended to use absolute URL's for better crawlability and indexability.

Therefore, you can not use this variation

> sitemap: /sitemap.xml

Recommended syntax is

Sitemap: https://www.yourdomain.com/sitemap.xml

Note:

  • Don't forgot to capitalise the first letter in "sitemap"
  • Don't forgot to put space after "Sitemap:"
Deepak Mathur
  • 291
  • 3
  • 5
-5

Good technical & logical question my dear friend. No in robots.txt file you can't go with relative URL of the sitemap; you need to go with the complete URL of the sitemap.

It's better to go with "sitemap: https://www.example.com/sitemap_index.xml"

In the above URL after the colon gives space. I also like to support Deepak.

cstpl123
  • 21
  • 2