5

With lxml, I am not sure how to properly remove the namespace of an existing element and set a new one.

For instance, I'm parsing this minimal xml file:

<myroot xmlns="http://myxml.com/somevalue">
    <child1>blabla</child1>
    <child2>blablabla</child2>
</myroot>

... and I'd like it to become:

<myroot xmlns="http://myxml.com/newvalue">
    <child1>blabla/child1>
    <child2>blablabla</child2>
</myroot>

With lxml:

from lxml import etree as ET
tree = ET.parse('myfile.xml')
root= tree.getroot()

If I inspect root:

In [7]: root
Out[7]: <Element {http://myxml.com/somevalue}myroot at 0x7f6e13832588>
In [8]: root.nsmap
Out[8]: {None: 'http://myxml.com/somevalue'}
In [11]: root.tag
Out[11]: '{http://myxml.com/somevalue}myroot'

Ideally, I would like to end up with:

In [8]: root.nsmap
Out[8]: {None: 'http://myxml.com/newvalue'}
In [11]: root.tag
Out[11]: '{http://myxml.com/newvalue}myroot'

As for the tag, it's just a matter of setting the right string. How about nsmap?

Ricky Robinson
  • 21,798
  • 42
  • 129
  • 185
  • See this answer of mine: https://stackoverflow.com/a/20956523/407651. It has a score of -2, but it provides what I think is the easiest way to change the namespace. – mzjn Aug 02 '18 at 14:01
  • It's a workaround for a simple case, but it doesn't provide an answer to the question, I'm afraid – Ricky Robinson Aug 02 '18 at 14:11
  • 1
    Yes, it is a workaround. I am not aware of anything better unfortunately. Manipulating namespaces can be surprisingly hard. Updating `nsmap` has no effect. See https://bugs.launchpad.net/lxml/+bug/555602 (this issue is mentioned in a comment on the linked answer). See also https://stackoverflow.com/a/31870245/407651. – mzjn Aug 02 '18 at 14:24
  • I see. It seems inconceivable that something so simple is not available in standard libraries in Python... In `xml.etree.ElementTree` I can remove all namespaces just by removing `{*}` from tag values and then reset them with `.set('xmlns', 'someURI')` on the desired elements. With `lxml`, that results into elements with two `xmlns` tags: the original one and the new one. I'm rather disappointed... – Ricky Robinson Aug 02 '18 at 14:35
  • @mzjn ... the downvotes possibly is due to treating the XML as a text file and not using proper DOM library methods. – Parfait Aug 02 '18 at 14:59
  • @RickyRobinson ... you can always run XSLT to *change XML files* which `lxml` can run. Updating namespaces is a regular need. Please post XML for a [MCVE]. – Parfait Aug 02 '18 at 15:00
  • @Parfait: What DOM library methods? There are no methods that work. That is the whole point of this discussion. And yes, XSLT is what I suggested in the answer to a similar question that I linked to in a previous comment: https://stackoverflow.com/a/31870245/407651. – mzjn Aug 02 '18 at 15:05
  • @Parfait, sure, I updated my question. Any xml will do, though. My example is simple enough that I would choose @mzjn's workaround, but the point is to use `lxml`... – Ricky Robinson Aug 02 '18 at 15:12
  • Ricky, please give @mzjn's XSLT link in above comment a try and come back with any issues. – Parfait Aug 02 '18 at 15:50

1 Answers1

5

I agree with mzjn and Parfait; I'd use XSLT to change the namespace.

You can make the XSLT fairly reusable by having the old and new namespaces passed in as parameters.

Example...

XML Input (input.xml)

<myroot xmlns="http://myxml.com/somevalue">
    <child1>blabla</child1>
    <child2>blablabla</child2>
</myroot>

XSLT 1.0 (test.xsl)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:param name="orig_namespace"/>
  <xsl:param name="new_namespace"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*" priority="1">
    <xsl:choose>
      <xsl:when test="namespace-uri()=$orig_namespace">
        <xsl:element name="{name()}" namespace="{$new_namespace}">
          <xsl:apply-templates select="@*|node()"/>
        </xsl:element>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Python

from lxml import etree

tree = etree.parse("input.xml")
xslt = etree.parse("test.xsl")

orig_namespace = "http://myxml.com/somevalue"
new_namespace = "http://myxml.com/newvalue"

new_tree = tree.xslt(xslt, orig_namespace=f"'{orig_namespace}'",
                     new_namespace=f"'{new_namespace}'")
print(etree.tostring(new_tree, pretty_print=True).decode("utf-8"))

Output

<myroot xmlns="http://myxml.com/newvalue">
  <child1>blabla</child1>
  <child2>blablabla</child2>
</myroot>

Also, if you use the following input (that uses a namespace prefix)...

<ns1:myroot xmlns:ns1="http://myxml.com/somevalue">
    <ns1:child1>blabla</ns1:child1>
    <ns1:child2>blablabla</ns1:child2>
</ns1:myroot>

you get this output...

<ns1:myroot xmlns:ns1="http://myxml.com/newvalue">
  <ns1:child1>blabla</ns1:child1>
  <ns1:child2>blablabla</ns1:child2>
</ns1:myroot>

See https://lxml.de/xpathxslt.html for more info on using XSLT with lxml.

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • This code doesn't work. [First the transform command needs to be defined](https://lxml.de/xpathxslt.html#xslt-result-objects), e.g. `transform = etree.XSLT(xslt)`, then apply it to the XML doc: `newtree = transform(tree, orig_namespace=...)`. I tried updating your answer, but it was rejected... – ganzpopp Nov 04 '19 at 14:21
  • @ganzpopp - hmm... I don’t think I would’ve posted it if it didn’t work. I always test before posting. Do you get an error when running? What version of python and lxml? – Daniel Haley Nov 04 '19 at 20:33
  • Checked specifically with your code, doesn't work with Python 3.7.5 and lxml 4.4.1. – ganzpopp Nov 06 '19 at 09:09
  • @ganzpopp - I haven’t had a chance to try again yet, but can you please explain what “doesn’t work” means? Is there a specific error? Also, I was probably using Python 3.6; are you able to try with that version? I will try both versions at some point in the next 24 hrs. – Daniel Haley Nov 06 '19 at 12:28
  • @ganzpopp - I tried with Python 3.7.5 (and 3.6.5 and 3.7.3) and lxml 4.4.1 and everything works fine. Also, in response to your link/comment that "First the transform command needs to be defined", please see https://lxml.de/xpathxslt.html#the-xslt-tree-method (or just scroll further down after clicking your link) for an example of the `xslt()` tree method which is "_a convenience method on ElementTree objects for doing XSL transformations_". It's a shortcut for the code you posted in a comment (see the example in the documentation). – Daniel Haley Nov 06 '19 at 23:53
  • Since this answer is fairly old and I'm having to looking at it again, I did make an update to replace the `.format()`'s with f-string literals. I think it's more readable that way. – Daniel Haley Nov 07 '19 at 00:20
  • 1
    I can confirm that the code works with Python 3.7.1 and lxml 4.4.1 (I upvoted a long time ago!). – mzjn Nov 07 '19 at 06:16
  • Seems to work indeed, I guess an old version of lxml was installed. Confirmed working with Python 3.8.5 and lxml 4.5.2. – ganzpopp Oct 01 '20 at 08:55