0

Given XML as string :

xml_as_string = """<root xmlns:SOAP-ENV="http://w/e">
    <Context operation="something.wsdl">
        <SOAP_Version>123.321</SOAP_Version>
        <Namespace xmlns:SOAP-ENV="http://dummy.com"/>
    </Context>
    <Header/>
    <Body>
        <ns2:Complex xmlns:ns2="http://www.test.this/idk">
            <ns2:simple>
                <ns2:child1>IKM</ns2:child1>
                <ns2:child2>1234</ns2:child2>
                <ns2:child3>S</ns2:child3>
            </ns2:simple>
            <ns2:simple>
                <ns2:child1>QEw</ns2:child1>
                <ns2:child2>10028111</ns2:child2>
                <ns2:parentchild1>IKM</ns2:parentchild1>
                <ns2:parentchild2>1234</ns2:parentchild2>
                <ns2:child3>S</ns2:child3>
            </ns2:simple>
            <ns2:simple>
                <ns2:child1>IOW</ns2:child1>
                <ns2:child2>100043896</ns2:child2>
                <ns2:parentchild1>QEw</ns2:parentchild1>
                <ns2:parentchild2>10028111</ns2:parentchild2>
            </ns2:simple>
            <ns2:extra>
                <ns2:childextra>0</ns2:childextra>
            </ns2:extra>
        </ns2:Complex>
    </Body>
</root>
"""

Creating xml tree using xml.etree.ElementTree:

`import xml.etree.ElementTree as ET`

root = ET.fromstring(xml_as_string)

For a specific path, I'm trying to print all the values found

path = './Body/Complex/simple/child1'

path_vals = root.findall(path)
print([e.text for e in path_vals])

The result is an empty list :

[]

Is there any way to accomplish this in python ?

Henry-G
  • 131
  • 1
  • 8

1 Answers1

1

You probably want all the text associated with child1 : You need to use the namespace to get the data : "http://www.test.this/idk" is the namespace

namespace = '{http://www.test.this/idk}'

[ent.text for ent in root.findall(F".//{namespace}child1")]

['IKM', 'QEw', 'IOW']

If you want to do away with the namespaces, you could give parsel a go :

from parsel import Selector
selector = Selector(xml_as_string, 'xml')

#remove namespace
selector.remove_namespaces()

#get your required text
selector.xpath(".//child1/text()").getall()

['IKM', 'QEw', 'IOW']
halfer
  • 19,824
  • 17
  • 99
  • 186
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • u could pass it as a variable, then use fstrings. – sammywemmy Jun 04 '20 at 11:24
  • Hmm, then idealy i'd prefer something that either ignores namespaces or figures them out by itself, hmm, looking into lxml rn, maybe, just maybe – Henry-G Jun 04 '20 at 11:26
  • edited code, using [parsel](https://parsel.readthedocs.io/en/latest/usage.html) – sammywemmy Jun 04 '20 at 11:30
  • 3
    With ElementTree in Python 3.8, you can use a wildcard for the namespace. Example: https://stackoverflow.com/a/62117710/407651. – mzjn Jun 04 '20 at 11:37