16

Assume that I've the following XML which I want to modify using Python's ElementTree:

<root xmlns:prefix="URI">
  <child company:name="***"/>
  ...
</root> 

I'm doing some modification on the XML file like this:

import xml.etree.ElementTree as ET
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')

Then the XML file looks like:

<root xmlns:ns0="URI">
  <child ns0:name="***"/>
  ...
</root>

As you can see, the namepsace prefix changed to ns0. I'm aware of using ET.register_namespace() as mentioned here.

The problem with ET.register_namespace() is that:

  1. You need to know prefix and URI
  2. It can not be used with default namespace.

e.g. If the xml looks like:

<root xmlns="http://uri">
    <child name="name">
    ...
    </child>
</root>

It will be transfomed to something like:

<ns0:root xmlns:ns0="http://uri">
    <ns0:child name="name">
    ...
    </ns0:child>
</ns0:root>

As you can see, the default namespace is changed to ns0.

Is there any way to solve this problem with ElementTree?

Community
  • 1
  • 1
amrezzd
  • 1,787
  • 15
  • 38
  • Possible duplicate of [xml.etree.ElementTree - Trouble setting xmlns = '...'](https://stackoverflow.com/questions/25225934/xml-etree-elementtree-trouble-setting-xmlns) – stovfl Jan 30 '19 at 14:14
  • The dup link uses clearly `ET.register_namespace(...`. [Edit] your Question to [mcve] to show how you use it. – stovfl Jan 30 '19 at 19:23
  • @stovfl It's not about preserving the namespace and didn't help me. The name space should not be hard coded, it can be `xmlns:prefix="URI"` with any prefix and URI. – amrezzd Jan 31 '19 at 18:29
  • The only way to preserve the namespace prefix with ElementTree is by using `register_namespace()`. If you don't like that, try lxml instead. – mzjn Jan 31 '19 at 18:42
  • @mzin You need to know `prefix` and `URI` when using `register_namespace()`. As I said, `I don't want to hard code the namespace`. Is there any way to do this with `ElementTree`? – amrezzd Jan 31 '19 at 19:23
  • @stovfl Editted the question to clearify the problem. – amrezzd Jan 31 '19 at 19:54
  • @AmirRezazadeh: Read [`lxml` namespaces](https://lxml.de/tutorial.html#namespaces) `lxml.etree` allows you to look up the current namespaces defined for a node through the `.nsmap` property:. – stovfl Jan 31 '19 at 20:34
  • 1
    See https://stackoverflow.com/a/42372404/407651 for a way to get the namespaces in the document. – mzjn Feb 01 '19 at 06:13

1 Answers1

34

ElementTree will replace those namespaces' prefixes that are not registered with ET.register_namespace. To preserve a namespace prefix, you need to register it first before writing your modifications on a file. The following method does the job and registers all namespaces globally,

def register_all_namespaces(filename):
    namespaces = dict([node for _, node in ET.iterparse(filename, events=['start-ns'])])
    for ns in namespaces:
        ET.register_namespace(ns, namespaces[ns])

This method should be called before ET.parse method, so that the namespaces will remain as unchanged,

import xml.etree.ElementTree as ET
register_all_namespaces('filename.xml')
tree = ET.parse('filename.xml')
# XML modification here
# save the modifications
tree.write('filename.xml')
amrezzd
  • 1,787
  • 15
  • 38
  • 4
    This solution is much better than I have seen on many other questions for the same topic. Thanks for sharing it. – Tyler Russell Oct 24 '20 at 16:21
  • does this mean the xml needs to be parsed twice? or can i somehow get the ElementTree out of this process, as i do it? – starwarswii May 20 '21 at 18:55
  • @Starwarswii Yes, if you want more control on that I think you can use `XMLPullParser` with `start-ns` event, fetching namespaces and then calling `ET.register_namespace`. – amrezzd May 21 '21 at 10:12
  • thank you for this answer. I was pulling my hair out with my namespaces getting replaced after a simple tweak to the XML. – Doug Baer Nov 02 '21 at 22:48
  • It does not matter if `register_namespace` comes before or after `ET.parse`. `register_namespace` only affects serialization, not parsing. – mzjn Feb 21 '23 at 16:07