0

I am not able create a sitemap with the following code?

from usp.tree import sitemap_tree_for_homepage

tree = sitemap_tree_for_homepage('')
print(tree)

for page in tree.all_pages():
    print(page)
    
May
  • 3
  • 2
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 19 '22 at 20:27

1 Answers1

0

The sitemap layout looks like this:

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <url>

      <loc>http://www.example.com/</loc>

      <lastmod>2005-01-01</lastmod>

      <changefreq>monthly</changefreq>

      <priority>0.8</priority>

   </url>

</urlset> 

In this thread you can read how to create a xml file:

from usp.tree import sitemap_tree_for_homepage
import xml.etree.cElementTree as ET
import simplejson as json

tree = sitemap_tree_for_homepage('https://www.nytimes.com/')

root = ET.Element("urlset", xmlns="http://www.sitemaps.org/schemas/sitemap/0.9")

for page in tree.all_pages():
    url = page.url
    prio = json.dumps(page.priority, use_decimal=True)
    # format YYYY-MM-DDThh:mmTZD see: https://www.w3.org/TR/NOTE-datetime
    lm = page.last_modified.strftime("%Y-%m-%dT%H:%M%z")
    cf = page.change_frequency.value
    urlel = ET.SubElement(root, "url")
    ET.SubElement(urlel, "loc").text = url
    ET.SubElement(urlel, "lastmod").text = lm
    ET.SubElement(urlel, "changefreq").text = cf
    ET.SubElement(urlel, "priority").text = prio

ET.indent(root, "  ") # pretty print
xmltree = ET.ElementTree(root)
xmltree.write("sitemap.xml", encoding="utf-8", xml_declaration=True )
    

If you want the lastmod to be todays date. Import date from datetime.

from datetime import date

and replace

page.last_modified.strftime("%Y-%m-%dT%H:%M%z")

with

date.today().strftime("%Y-%m-%dT%H:%M%z")

sitemap.xml

<?xml version='1.0' encoding='utf-8'?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://www.example.com/</loc>
    <lastmod>2022-07-19T15:24+0000</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://www.example.com/about</loc>
    <lastmod>2022-07-19T15:24+0000</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.8</priority>
  </url>

</urlset>

If you use https://www.example.com/ as your url you will not get the ouput above. Because example.com does not have a sitemap.xml. So use a different url.

noah1400
  • 1,282
  • 1
  • 4
  • 15