1

My file contains the following data:

Original:

<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>

Expected:

<?xml version="1.0" encoding="UTF-8"?><urlset> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>

I use etree to parse the file and I want to remove the attribute from the root element 'urlset'

import xml.etree.ElementTree as ET

tree = ET.parse("/Users/hsyang/Downloads/VI-0-11-14-2016_20.xml")
root = tree.getroot()

print root.attrib
>> {}

root.attrib.pop("xmlns", None)

print root.attrib
>> {}
ET.tostring(root)

I thought I was supposed to get {xmlns:"http://www.sitemaps.org/schemas/sitemap/0.9"} when i print root.attrib the first time but I got an empty dictionary. Can someone help?

Appreciate it!

Yumi
  • 241
  • 2
  • 5
  • 13

2 Answers2

1

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" looks like a regular attribute but it is a special case, namely a namespace declaration.

Removing, adding, or modifying namespaces can be quite hard. "Normal" attributes are stored in an element's writable attrib property. Namespace mappings on the other hand are not readily available via the API (in the lxml library, elements do have a nsmap property, but it is read-only).

I suggest a simple textual search-and-replace operation, similar to the answer to Modify namespaces in a given xml document with lxml. Something like this:

with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
    data = infile.read()
    data = data.replace(' xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"', '')
    outfile.write(data)

See also How to insert namespace and prefixes into an XML string with Python?.

Community
  • 1
  • 1
mzjn
  • 48,958
  • 13
  • 128
  • 248
0

In standard library xml.etree.ElementTree there is no special method to remove an attribute, but all attributes are stored in a attrib which is a dict and any attribute can be removed from attrib as a key from a dict:

    import xml.etree.ElementTree as ET

    tree = ET.parse(file_path)
    root = tree.getroot()      

    print(root.attrib)  # {'xyz': '123'}

    root.attrib.pop("xyz", None)  # None is to not raise an exception if xyz does not exist

    print(root.attrib)  # {}

    ET.tostring(root)
    '<urlset> <url> <changefreq>daily</changefreq> <loc>http://www.example.com</loc></url></urlset>'
Gennady Kandaurov
  • 1,914
  • 1
  • 15
  • 19
  • the real key-value of the root attribute in my file is actually but for simplicity i swapped it with xyz="123". I was wondering if xmlns means anything special in XML so that it's not treated as regular attribute keys here...just my non-developer suspicion. – Yumi Dec 02 '16 at 07:24
  • Yes, `xmlns` has special meaning in `xml`: http://www.w3schools.com/xml/xml_namespaces.asp So you have to remove that attribute carefully – Gennady Kandaurov Dec 02 '16 at 08:55
  • Got it. Thanks! Just wanted to clarify tho - I was using xml.etree.ElementTree. I tried both methods you suggested and both methods (.pop, strip_attributes) raised errors; I think in your examples you were referring to lxml.etree. I tried to find any method that does the same in xml.etree but could not find any. – Yumi Dec 02 '16 at 16:46
  • You are right, examples were for `lxml`. Updated for `xml.etree.ElementTree`. – Gennady Kandaurov Dec 02 '16 at 21:34
  • But this method can't be applied to `xmlns`, since `xmlns` is not stored in `attrib` attribute. – Gennady Kandaurov Dec 02 '16 at 21:39
  • Appreciate the edits. I tried what you suggested and updated my post, but it seems that root.attrib was an empty dictionary to begin with. – Yumi Dec 03 '16 at 00:04
  • As I wrote in last comment, `xmlns` is not stored in `attrib`) – Gennady Kandaurov Dec 03 '16 at 06:49