-1

I need to parse xml into another structure. example:
a = """
<actors xmlns:fictional="http://characters.example.com">
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
"""

I am using ElementTree to parse the tree
root = ElementTree.fromstring(a)

When I apply
root[0][1].tag

I get the result
{``http://characters.example.com``}character

but I need to get the result as it was in the original file
fictional:character

how do I achieve this result?

Kvarel
  • 1

4 Answers4

1

With XPath, you can return namespace prefixes with local name of an element using name() (and without prefix: local-name()). Python's third-party package, lxml, can run XPath 1.0:

import lxml.etree as lx

a = """
<actors xmlns:fictional="http://characters.example.com">
 <actor>    
    <name>Eric Idle</name>
     <fictional:character>Sir Robin</fictional:character>
     <fictional:character>Gunther</fictional:character>
     <fictional:character>Commander Clement</fictional:character>
   </actor>
</actors>
"""

root = xl.fromstring(a)

for el in root.xpath("/actor/*"):
   print(el.xpath("name()"))

# name
# fictional:character
# fictional:character
# fictional:character
Parfait
  • 104,375
  • 17
  • 94
  • 125
0

with ElementTree library there is no simple way to do it.

marksoe
  • 58
  • 8
0

You can use re.sub():

import xml.etree.ElementTree as ET
import re
from io import StringIO

a = """
<actors xmlns:fictional="http://characters.example.com">
 <actor>    
    <name>Eric Idle</name>
     <fictional:character>Sir Robin</fictional:character>
     <fictional:character>Gunther</fictional:character>
     <fictional:character>Commander Clement</fictional:character>
   </actor>
</actors>
"""
f = StringIO(a)

tree = ET.parse(f)
root = tree.getroot()

ns={"fictional": "http://characters.example.com"}

for elem in root.findall(".//fictional:character", ns):
    print(re.sub("{http://characters.example.com}", "fictional:", elem.tag), elem.text)

Output:

fictional:character Sir Robin
fictional:character Gunther
fictional:character Commander Clement
Hermann12
  • 1,709
  • 2
  • 5
  • 14
0

I found out that the expat parser is engaged in the transformation of namespaces. It is created by the parser, which is used by default ElementTree.

xml.etree.ElementTree.XMLParser

is created in the initialization method with the command

parser = expat.ParserCreate(encoding, "}")

You can override the standard behavior of the parser if you redefine this line to

parser = expat.ParserCreate(encoding, None)

In this case, namespace processing is disabled

Kvarel
  • 1