51

I have been searching around using Google but I can't find an answer to this question.

A robots.txt file can contain the following line:

Sitemap: http://www.mysite.com/sitemapindex.xml

but is it possible to specify multiple sitemap index files in the robots.txt and have the search engines recognize that and crawl ALL of the sitemaps referenced in each sitemap index file? For example, will this work:

Sitemap: http://www.mysite.com/sitemapindex1.xml

Sitemap: http://www.mysite.com/sitemapindex2.xml

Sitemap: http://www.mysite.com/sitemapindex3.xml
hakre
  • 193,403
  • 52
  • 435
  • 836
user306942
  • 815
  • 2
  • 8
  • 6

5 Answers5

108

Yes it is possible to have more than one sitemap-index-file:

You can have more than one Sitemap index file.

Highlight by me.

Yes it is possible to list multiple sitemap-files within robots.txt, see as well in the sitemap.org site:

You can specify more than one Sitemap file per robots.txt file.

Sitemap: http://www.example.com/sitemap-host1.xml

Sitemap: http://www.example.com/sitemap-host2.xml

Highlight by me, this can not be misread I'd say, so simply spoken, this can be done.

This is also necessary for cross-submits, for which btw. the robots.txt has been chosen.

Btw Google, Yahoo and Bing, all are members of sitemaps.org:

Sitemap 0.90 is offered under the terms of the Attribution-ShareAlike Creative Commons License and has wide adoption, including support from Google, Yahoo!, and Microsoft.

So you can rest assured that your sitemap entries will be properly read by the search engine bots.

Submitting them via webmaster tools can not hurt either - as John Mueller commented.

Community
  • 1
  • 1
Miltan Chaudhury
  • 1,081
  • 1
  • 7
  • 3
  • 2
    The Google robots.txt documentation confirms this to be true for Google, and references that it should work for other bots as well: "Multiple sitemap entries may exist. As non-group-member records, these are not tied to any specific user-agents and may be followed by all crawlers, provided it is not disallowed." The Google robots.txt documentation can be found here: https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt – David Marchelya Oct 27 '12 at 12:14
  • 1
    The question asks if multiple sitemap *index* entries may exist in `robots.txt` not if multiple sitemap entries may exist. – Nigel Alderton May 22 '13 at 07:37
  • 1
    @NigelAlderton: The specs are likewise clear about that: [*"You can have more than one Sitemap index file."*](http://www.sitemaps.org/protocol.html#index). If you compare then with the *Sitemaps & Cross Submits* section, it is not only clear but inherently necessary to allow multiple index files per `robots.txt` for cross-domain index usage. – hakre Aug 27 '13 at 09:50
8

If your sitemap is over 10 MB (uncompressed) or has more than 50 000 entries Google requires that you use multiple sitemaps bundled with a Sitemap Index File.

In your robots.txt point to a sitemap index which should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml.gz</loc>
      <lastmod>2012-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml.gz</loc>
      <lastmod>2012-01-01</lastmod>
   </sitemap>
</sitemapindex>
hakre
  • 193,403
  • 52
  • 435
  • 836
scott
  • 583
  • 6
  • 11
  • 2
    Um, not exactly. From http://www.sitemaps.org/protocol.php: "Each text file can contain a maximum of 50,000 URLs and must be no larger than 10MB (10,485,760 bytes)." – ayke Aug 21 '13 at 02:52
  • 2
    Google has since upped the allowed size per sitemap file to 50MB http://stackoverflow.com/questions/2887358/limitation-for-google-sitemap-xml-file-size – Ultroman the Tacoman May 21 '15 at 17:03
  • 2
    Would it be better to sitemap: in robots point to sitemapindex.xml or have multiple sitemap: lines pointing to each one? – Warren Dec 14 '15 at 01:12
  • @WarrenDodsworth I think this does not matter, but if you have a "sitemapsitemap" file its easier to submit only one file to google / bing / etc instead of each sitemap file by itself if you choose to do so. – Philiiiiiipp Dec 09 '16 at 16:32
  • Sitemaps has standardised the 50MB limit: "once uncompressed must be no larger than 50MB" https://www.sitemaps.org/protocol.html – Luke Nov 08 '17 at 01:40
4

It's recommended to create a sitemap index file, rather separate XML URLs to put in your your robots.txt file.

Then, put the indexed sitemap URL as below in your robots.txt file.

Sitemap: http://www.yoursite.com/sitemap_index.xml

If you want to learn how to create indexed sitemap URL, then follow this guide from sitemap.org

Best Practice:

  • Create image sitemap, video sitemap separately if your website has huge number of such contents.
  • Check spelling of robots file, it should be robots.txt, don't use robot.txt or any misspelling. Put robots.txt file in root directly only.
  • For more info, you can visit robots.txt's official website.
Deepak Mathur
  • 291
  • 3
  • 5
0

You need specify in your in your file sitemap.xml this code:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>http://www.exemple.com/sitemap1.xml.gz</loc>
    </sitemap>
    <sitemap>
        <loc>http://www.exemple.com/sitemap2.xml.gz</loc>
    </sitemap>
</sitemapindex>

source: https://support.google.com/webmasters/answer/75712?hl=fr#

Max Base
  • 639
  • 1
  • 7
  • 15
Lamri Djamal
  • 236
  • 1
  • 10
-4

It is possible to write them, but it is up to the search engine to know what to do with it. I suspect many search engines will either "keep digesting" more and more tokens, or alternatively, take the last sitemap they find as the real one.

I propose that the question be "if I want ____ search engine to index my site, would I be able to define multiple sitemaps?"

Etamar Laron
  • 1,172
  • 10
  • 23
  • 1
    Yea, this seems reasonable. I think read in Google FAQ that they do support this. – user306942 Apr 07 '10 at 17:37
  • 1
    Google does support that, but if you want to be certain, just manually submit the Sitemap files in Webmaster Tools. – John Mueller Apr 12 '10 at 07:16
  • 1
    -1 It is in the protocol specs. This answer here is a lame excuse for not reading it and assuming everybody else - especially implementors - would not read it either. The chance of not supporting sitemaps at all in robots.txt is much higher then not supporting according to specs. – hakre Aug 27 '13 at 09:30
  • @Etamar Laron: Can you please review your answer? For me it reads a bit that you say here, most search engines would not support the sitemap standard. Can you please clarfiy a bit and perhaps differentiate? – hakre Aug 27 '13 at 09:43
  • @hakre - if you read my answer carefully you'd see that it is very precise, the -1 is your call. Why not next time write your second note, and only then decide?... – Etamar Laron Nov 01 '13 at 14:10
  • @EtamarLaron: Do you want to say that the answer isn't correct but it does not deserve a DV either? Just a comment? I'm not so sure if that would be right. Also you didn't respond to the second comment either, I would be lucky if you would have done so, so I could review the DV. There's nothing set in stone. – hakre Nov 02 '13 at 10:42
  • @hakre Interestingly, Baidu, which is the biggest search engine in China, don't supported gzipped sitemap. You can not really have too much believe in the others. – ddou Aug 11 '14 at 07:56