1

This question is a follow up to this answer: https://stackoverflow.com/a/51972010/3480297

I'm trying to remove the namespace from an XML file. The linked answer works fine when there are no comments in the XML. However, if there is a comment, an error is thrown.

This is an example of my code:

from lxml import etree

input_xml = '''
<package xmlns="http://apple.com/itunes/importer">
  <provider>some data <!-- example comment--> </provider>
  <language>en-GB</language>
</package>
'''
root = etree.fromstring(input_xml)

# Remove namespace prefixes
for elem in root.getiterator():
    elem.tag = etree.QName(elem).localname
# Remove unused namespace declarations
etree.cleanup_namespaces(root)

print(etree.tostring(root).decode())

This throws the following error:

ValueError: Invalid input tag of type class <'cython_function_or_method'>

EDIT:

If I have the following "input_xml" structure, not all the namespaces are taken out using the code in the below answer.

<package xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://com/scheme/location/example/ Location.xsd ">
  <provider>some data <!-- example comment--> </provider>
  <language>en-GB</language>
</package>

The result of the code is still:

<package xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://com/scheme/location/example/ Location.xsd ">
  <provider>some data <!-- example comment--> </provider>
  <language>en-GB</language>
</package>
Adam
  • 2,384
  • 7
  • 29
  • 66
  • *"I'm trying to remove the namespace from an XML file."* That's always suspicious and rarely a good idea (or necessary). Why are you trying to do that? – Tomalak Mar 02 '20 at 10:56
  • I'm trying to perform simple outputs (without extracting any information specifically from the XML at that point) and I would like to not have the namespaces. – Adam Mar 02 '20 at 11:01
  • Not sure if I get that...? Simple outputs without extracting information? – Tomalak Mar 02 '20 at 12:20
  • I meant that modifying the XML directly won't cause me any issues as I'm just displaying certain parts of it without parsing/extracting information from it. So modifying it won't be a problem. – Adam Mar 02 '20 at 13:31

1 Answers1

2

Make sure that the node is not a comment before changing the tag. The code below also removes any attributes that are in a namespace.

for elem in root.getiterator():
    # For elements, replace qualified name with localname
    if not(type(elem) == etree._Comment):
        elem.tag = etree.QName(elem).localname

    # Remove attributes that are in a namespace
    for attr in elem.attrib:
        if "{" in attr:
            elem.attrib.pop(attr)
mzjn
  • 48,958
  • 13
  • 128
  • 248
  • Thank you! This works for the original code. But I have an issue when there are additional namespaces and they're not all being removed. Could you have a look at my edited question please? – Adam Mar 02 '20 at 11:01
  • In the second example, you have an attribute bound to a namespace (`xsi:schemaLocation`). You need to remove this attribute if you don't want any namespace declarations in the document. – mzjn Mar 02 '20 at 11:38
  • Is there a way to do that with the code rather than modifying the XML manually? – Adam Mar 02 '20 at 11:41