0

I have the following simpplified XML:

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xmlns:soap="http://www.w3.org/2003/05/soap-envelope"> 
    <soap:Body>
        <ReadResponse xmlns="ABCDEFG.com">
            <ReadResult>
                <Value>
                    <Alias>x1</Alias>
                    <Timestamp>2013-11-11T00:00:00</Timestamp>
                    <Val>113</Val>
                    <Duration>5000</Duration>
                    <Quality>128</Quality>
                </Value>
                <Value>
                    <Alias>x1</Alias>
                    <Timestamp>2014-11-11T00:02:00</Timestamp>
                    <Val>110</Val>
                    <Duration>5000</Duration>
                    <Quality>128</Quality>
                </Value>
                <Value>
                    <Alias>x2</Alias>
                    <Timestamp>2013-11-11T00:00:00</Timestamp>
                    <Val>101</Val>
                    <Duration>5000</Duration>
                    <Quality>128</Quality>
                </Value>
                <Value>
                    <Alias>x2</Alias>
                    <Timestamp>2014-11-11T00:02:00</Timestamp>
                    <Val>122</Val>
                    <Duration>5000</Duration>
                    <Quality>128</Quality>
                </Value>
            </ReadResult>
        </ReadResponse>
    </soap:Body>
</soap:Envelope>

and would like to parse it into a dataframe with the following structure (keeping some of the tags and discarding the rest):

Timestamp                x1    x2
2013-11-11T00:00:00      113  101
2014-11-11T00:02:00      110  122

The problem is since the XML file includes namespaces, I don't know how to proceed. I have gone through several tutorials (e.g., https://docs.python.org/2/library/pyexpat.html) and questions (e.g., How to open this XML file to create dataframe in Python? and Parsing XML with namespace in Python via 'ElementTree') but none of them have helped/worked. I appreciate if anyone can help me sorting this out.

Community
  • 1
  • 1
Sepehr
  • 442
  • 1
  • 6
  • 17
  • You can find the answer here in the 3rd comment: [How to set the `xpath` of pandas's read_xml?](https://stackoverflow.com/questions/68281666/how-to-set-the-xpath-of-pandass-read-xml) – pmko Jun 17 '22 at 02:33

1 Answers1

2

Here is an example on how to parse an xml using lxml and xpaths:

from lxml import etree
namespaces = {'abc': "ABCDEFG.com"}
xmltree = etree.fromstring(xml_string)
items = xmltree.xpath('//abc:Alias/text()', namespaces=namespaces)

print items
heinst
  • 8,520
  • 7
  • 41
  • 77