Find element with function find in lxml

Question

I have next xml:

<?xml version='1.0' encoding='utf-8'?>
<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
  <SOAP:Header>
  </SOAP:Header>
  <SOAP:Body>
    <Server_Reply xmlns="some_url">
      <conversionRate>
        <conversionRateDetail>
          <currency>dollar</currency>
        </conversionRateDetail>
      </conversionRate>
    </Server_Reply>
  </SOAP:Body>
</SOAP:Envelope>

It is in reply.txt. Then I do:

with open('reply.txt', 'r') as f:
        reply = f.read()

reply_element = fromstring(reply)

I need to find Server_Reply element. When I do:

response = reply_element.find('Body/Server_Reply')

but it returns None. How to do it correct? In the end, I need to get Server_Reply element.

score 0 · Accepted Answer · answered Sep 22 '15 at 17:00

You need to use .// to indicate that you want to find Body which is a descendant (not a direct child) of the current element (which is SOAP:Envelope ).

And also, since your xml uses namespaces, you have to include the namespaces in your xpath (that you give to .find() . Example -

response = reply_xml.find('.//{http://www.w3.org/2003/05/soap-envelope}Body/{some_url}Server_Reply')

Or

response = reply_xml.find('.//SOAP:Body/dummy:Server_Reply',namespaces = {'SOAP':'http://www.w3.org/2003/05/soap-envelope', 'dummy':'some_url'})

Demo -

In [55]: s = """<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
   ....:   <SOAP:Header>
   ....:   </SOAP:Header>
   ....:   <SOAP:Body>
   ....:     <Server_Reply xmlns="some_url">
   ....:       <conversionRate>
   ....:         <conversionRateDetail>
   ....:           <currency>dollar</currency>
   ....:         </conversionRateDetail>
   ....:       </conversionRate>
   ....:     </Server_Reply>
   ....:   </SOAP:Body>
   ....: </SOAP:Envelope>"""

In [56]: reply_xml = etree.fromstring(s)

In [57]: reply_xml.find('.//SOAP:Body/dummy:Server_Reply',namespaces = {'SOAP':'http://www.w3.org/2003/05/soap-envelope', 'dummy':'some_url'})
Out[57]: <Element {some_url}Server_Reply at 0x481d708>

In [58]: reply_xml.find('.//{http://www.w3.org/2003/05/soap-envelope}Body/{some_url}Server_Reply')
Out[58]: <Element {some_url}Server_Reply at 0x481d708>

score 0 · Answer 2 · answered Sep 22 '15 at 17:11

I found xpath much more intuitive and easy:

from lxml import etree

xml = """<?xml version='1.0' encoding='utf-8'?>
<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
  <SOAP:Header>
  </SOAP:Header>
  <SOAP:Body>
    <Server_Reply xmlns="some_url">
      <conversionRate>
        <conversionRateDetail>
          <currency>dollar</currency>
        </conversionRateDetail>
      </conversionRate>
    </Server_Reply>
  </SOAP:Body>
</SOAP:Envelope>"""

et = etree.fromstring(xml)
server_reply = et.xpath('//*[local-name()="Server_Reply"]')

score 0 · Answer 3 · answered Sep 22 '15 at 17:44

Use xml.etree to do this.

#!/usr/bin/env python
import sys
from xml.etree import ElementTree
from lxml import etree

def run(fileName):
    parser = etree.XMLParser(ns_clean=True)
    data = ElementTree.parse(fileName, parser).getroot()
    namespaces = data.nsmap
    namespaces['some_url'] = 'some_url'
    # Creating without duplicates here, which contains the unique list of elements determined by values of subelements
    for row in data.findall('.//SOAP:Body/some_url:Server_Reply', namespaces = namespaces):
        print row

if __name__ == "__main__":
    run(sys.argv[1])

Then run python with XML file as your argument:

python findElement.py sampleFile.xml

For my XML/DTD parsing problem, your answer is the only one remotely close to get me anywhere close to making something useful out of the data. Thank you for that. — ctde, Nov 28 '22 at 05:43

Find element with function find in lxml

3 Answers3