0

Is there a way to remove namespaces from an xml (where I know there aren't any name collisions)? Currently I'm doing this for each known namespace:

s = re.sub(r'(<\/?)md:', r'\1', s)             # remove md:
s = re.sub(r'\s+xsi:', ' ', s)                 # remove xsi:

But I was wondering if there was something more generic that could be used. There are no CDATA allowed in the particular xml.

David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    *Don't*. https://stackoverflow.com/a/1732454/836748 – Aaron D. Marasco May 13 '20 at 20:46
  • 1
    You never actually need that, when you use an xml parser you can always specify all the required namespaces. – Wiktor Stribiżew May 13 '20 at 20:47
  • @WiktorStribiżew yes the issue is there are a lot of namespaces, not all of which I know (for example, there can be future namespace added in), so this is a bit more general-purpose for now. It's worked so far on production processing ~1M documents a day. – David542 May 13 '20 at 20:48
  • You can use XSLT processing to achieve this reliably. Just combine the _identity template_ with a template that transforms `name()`s to `local-name()`s. Then call the XSLT from Python. – zx485 May 13 '20 at 21:09
  • @zx485 that's an interesting approach, thanks for the feedback. Would you be able to add an answer that shows how you'd do that? – David542 May 13 '20 at 21:30
  • Here's an old answer which shows how to remove all namespaces from tags and attributes https://stackoverflow.com/a/33997423/2318649 – DisappointedByUnaccountableMod Nov 30 '21 at 16:28

1 Answers1

0

You can use the XSLT approach by calling the following XSLT-1.0 template from Python. It combines the identity template with a template that transforms the (full) name()s of the elements to their local-name()s only. That means all <ns1:abc> elements are transformed to <abc>, for example. The namespaces are omitted.

However, how useful this is depends on your usecase. It reduces the amount of information, so handle with care.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="node()|@*">   <!-- Identity template copies all nodes (except for elements, which are handled by the other template) -->
        <xsl:copy>
            <xsl:apply-templates select="node()|@*" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*">           <!-- Removes all namespaces from all elements -->
        <xsl:element name="{local-name()}">
            <xsl:apply-templates select="node()|@*" />
        </xsl:element>
    </xsl:template>

</xsl:stylesheet>

Apply it with an XSLT-1.0 (or above) framework/processor.

zx485
  • 28,498
  • 28
  • 50
  • 59