3

I have a website that has a directory that contains 100+ html files. I want crawlers to crawl all the html files that directory. I have already added following sentence to my robots.txt:

Allow /DirName/*.html$

Is there any way to include the files in the directory in sitemap.xml file so that all html files in the directory will get crawled? Something like this:

<url>
    <loc>MyWebsiteName/DirName/*.html</loc>
</url>
Anders
  • 8,307
  • 9
  • 56
  • 88
userlite
  • 135
  • 2
  • 5

2 Answers2

1

The sitemap protocol neither restricts or allows the use of wildcards; to be honest this is the first time i hear this. Also, I'm pretty much sure that search engines can't make use of the wildcards in sitemaps.

Please take a look at Google's recommendation of sitemap generators. There are tons of tools you can create a sitemap with in a blink of an eye.

methode
  • 5,348
  • 2
  • 31
  • 42
0

It is not allows the use of wildcards. if you run php in your server then you could list all files in the directory and generate sitemap.xml automatically using the DirectoryIterator .

// this is assume you have already a sitemap class.
$sitemap = new Sitemap;

// iterate the directory
foreach(new DirectoryIterator('/MyWebsiteName/DirName') as $directoryItem)
{
    // Filter the item
    if(!$directoryItem->isFile()) continue;

    // New basic sitemap.
    $url = new Sitemap_URL;

    // Set arguments.
    $url->set_loc(sprintf('/DirName/%1$s', $directoryItem->getBasename()))
        ->set_last_mod(1276800492)
        ->set_change_frequency('daily')
        ->set_priority(1);

    // Add it to sitemap.
    $sitemap->add($url);
}

// Render the output.
$response = $sitemap->render();

// Cache the output for 24 hours.
$cache->set('sitemap', $response, 86400);

// Output the sitemap.
echo $response;
eQ19
  • 9,880
  • 3
  • 65
  • 77