1

I'm working with a pretty complex XML like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- ***** Configuration Data exported at 20160623T110335 ***** -->
<impex:ExportData xmlns:impex="urn:swift:saa:xsd:impex">

<!-- *** Exported Data for Operator *** -->
<OperatorData xmlns="urn:swift:saa:xsd:impex:operator">

<ns2:OperatorDefinition xmlns="urn:swift:saa:xsd:operatorprofile" xmlns:ns2="urn:swift:saa:xsd:impex:operator" xmlns:ns3="urn:swift:saa:xsd:unit" xmlns:ns4="urn:swift:saa:xsd:licenseddestination" xmlns:ns5="urn:swift:saa:xsd:operator" xmlns:ns6="urn:swift:saa:xsd:authenticationservergroup">
    <ns2:Operator>
        <ns5:Identifier>
            <ns5:Name>jdoe</ns5:Name>
        </ns5:Identifier>
        <ns5:Description>John Doe</ns5:Description>
        <ns5:OperatorType>HUMAN</ns5:OperatorType>
        <ns5:AuthenticationType>LDAP</ns5:AuthenticationType>
        <ns5:AuthenticationServerGroup>
            <ns6:Type>LDAP</ns6:Type>
            <ns6:Name>LDAP_GROUP1</ns6:Name>
        </ns5:AuthenticationServerGroup>
        <ns5:LdapUserId>jdoe</ns5:LdapUserId>
        <ns5:Profile>
            <Name>DEV Users</Name>
        </ns5:Profile>
        <ns5:Unit>
            <ns3:Name>None</ns3:Name>
        </ns5:Unit>
    </ns2:Operator>
</ns2:OperatorDefinition>

</OperatorData>

</impex:ExportData>

In this XML there are numerous <ns2:OperatorDefinition> elements like the one I included. I'm having a hard time understanding how to pull out something like <ns5:Description> using lxml. All the examples for namespaces I'm finding are not this complex.

I'm trying to simply find the tags doing something like this -

from lxml import etree
doc = etree.parse('c:/robin/Operators_out.xml')

r = doc.xpath('/x:OperatorData/ns2:OperatorDefinition', namespaces={'x': 'urn:swift:saa:xsd:impex:operator'})
print len(r)
print r[0].text
print r[0].tag

I get Undefined namespace prefix.

whoisearth
  • 4,080
  • 13
  • 62
  • 130
  • 1
    You get "Undefined namespace prefix" because you haven't included the definition of the `ns2` prefix in the `namespaces` dictionary. – mzjn Jul 26 '16 at 17:09

1 Answers1

1

You may not need namespaces for your use-case, remove them to make parsing easier:

from lxml import etree, objectify

tree = etree.parse("input.xml")
root = tree.getroot()

# remove namespaces ----
for elem in root.getiterator():
    if not hasattr(elem.tag, 'find'): continue
    i = elem.tag.find('}')
    if i >= 0:
        elem.tag = elem.tag[i+1:]

objectify.deannotate(root, cleanup_namespaces=True)
# ----

name = root.findtext(".//OperatorDefinition/Operator/Identifier/Name")
print(name)

Prints jdoe.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • this works for a singular user I'm trying to figure out how to loop it if there's multiple `OperatorDefinitions` with the same tags in them? ie. I want to return `jdoe`, `asmith`, etc. – whoisearth Jul 27 '16 at 19:02
  • @whoisearth okay, do the `.//OperatorDefinition/Operator/Identifier/Name/text()`. – alecxe Jul 27 '16 at 19:06
  • @alecse I ended up doing this and it works - `for elem in tree.iterfind('.//OperatorDefinition'): print elem.findtext(".//Operator/Identifier/Name")` – whoisearth Jul 27 '16 at 19:17