Having trouble figuring out how to parse xml
in etree
with and without xpath
.
I am trying to get all instances of an element as a record, with put each instance's subfields into a dictionary. Ultimately, I want the field data into a Pandas DataFrame
with the val name attributes as column headers.
Below, each "map" is a record.
Sample xml
:
<?xml version="1.0" encoding="UTF-8" ?>
<response xmlns="a url">
<fn>
<list>
<map>
<val name="COUNT">0</val>
<val name="CATEGORY">AA</val>
<val name="AVERAGE">48.064133</val>
<val name="RANKING">4</val>
<val name="NAME">PORTER</val>
<val name="INDUSTRY_TAG">DESIGN</val>
</map>
<map>
<val name="COUNT">3</val>
<val name="CATEGORY">BA</val>
<val name="AVERAGE">77.33</val>
<val name="RANKING">27</val>
<val name="NAME">DANIELS</val>
<val name="INDUSTRY_TAG">INTERACTIVE</val>
</map>
<map>
<val name="COUNT">8</val>
<val name="CATEGORY">BB</val>
<val name="AVERAGE">102.85</val>
<val name="RANKING">15</val>
<val name="NAME">SWEETWATER</val>
<val name="INDUSTRY_TAG">GRAPHIC</val>
</map>
</list>
</fn>
</response>
The xml
comes from an api. "response" is the variable the api call has been assigned to. The code below generates one continuous list of the "val" variables.
root = etree.fromstring(response.content)
for node in root.findall('.//*/*[@name]')
print(node.text)
I have tried messing around with xpath
to try to get the "map" level, but that isn't working.
I have played around with xpath
statements based on this guide on xpath and these examples but only the following seems to find anything, but it's the same single list as above:
for node in root.xpath('//*[@name]'):
print(node.text)
I've tried an absolute path -- root.xpath('response/fn/list/map')
, but that doesn't seem to work. I tried root.xpath('//val')
because I thought that would get me all of the "val" variables, but that hasn't worked. If I enter root.xpath('//*')
I get that list of "val" values.
Clearly I am not grokking something fundamental and would appreciate being put on the path to wisdom.
Thanks