1

Having trouble figuring out how to parse xml in etree with and without xpath.

I am trying to get all instances of an element as a record, with put each instance's subfields into a dictionary. Ultimately, I want the field data into a Pandas DataFrame with the val name attributes as column headers.

Below, each "map" is a record.

Sample xml:

<?xml version="1.0" encoding="UTF-8" ?>
<response xmlns="a url">
    <fn>
        <list>
        <map>
        <val name="COUNT">0</val>
        <val name="CATEGORY">AA</val>
        <val name="AVERAGE">48.064133</val>
        <val name="RANKING">4</val>
        <val name="NAME">PORTER</val>
        <val name="INDUSTRY_TAG">DESIGN</val>
        </map>
        <map>
        <val name="COUNT">3</val>
        <val name="CATEGORY">BA</val>
        <val name="AVERAGE">77.33</val>
        <val name="RANKING">27</val>
        <val name="NAME">DANIELS</val>
        <val name="INDUSTRY_TAG">INTERACTIVE</val>
        </map>
        <map>
        <val name="COUNT">8</val>
        <val name="CATEGORY">BB</val>
        <val name="AVERAGE">102.85</val>
        <val name="RANKING">15</val>
        <val name="NAME">SWEETWATER</val>
        <val name="INDUSTRY_TAG">GRAPHIC</val>
        </map>
        </list>
    </fn>
</response>

The xml comes from an api. "response" is the variable the api call has been assigned to. The code below generates one continuous list of the "val" variables.

root = etree.fromstring(response.content)
for node in root.findall('.//*/*[@name]')
    print(node.text)

I have tried messing around with xpath to try to get the "map" level, but that isn't working.

I have played around with xpath statements based on this guide on xpath and these examples but only the following seems to find anything, but it's the same single list as above:

for node in root.xpath('//*[@name]'):
    print(node.text)

I've tried an absolute path -- root.xpath('response/fn/list/map'), but that doesn't seem to work. I tried root.xpath('//val') because I thought that would get me all of the "val" variables, but that hasn't worked. If I enter root.xpath('//*') I get that list of "val" values.

Clearly I am not grokking something fundamental and would appreciate being put on the path to wisdom.

Thanks

mattrweaver
  • 729
  • 4
  • 14
  • 36
  • For the absolute path it would have to be `root.find_element_by_xpath("response/fn/list/map[1]")` or `root.find_element_by_xpath("//map[1]")` – Chrispresso Apr 04 '18 at 18:25
  • @JeanRostan: The essential reason his XPath is failing is due to neglecting XML namespaces. This has been asked and answered many times over, thus the duplicate closure; there's nothing new here. – kjhughes Apr 04 '18 at 18:36
  • @kjhughes you're right, sorry for the disturbance, I was only thinking of doing it with elemttree and not xpath... – Jean Rostan Apr 04 '18 at 18:37
  • Sorry for the poorly-worded question and thanks to the person who tracked me down to share the code to help me get the values from my xml. – mattrweaver Apr 05 '18 at 11:59

0 Answers0