Parse xml with sub-nodes and create a Pandas dataframe

Question

I have the following xml format:

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <run>
      <information>
         <logfile>s.log</logfile>
         <version>33</version>
         <mach>1</mach>
         <problemname>mm1</problemname>
         <timestamp>20201218.165122.053486</timestamp>
      </information>
      <controls>
         <item>VARS</item>
      </controls>
      <result>
         <status>4</status>
         <time>3</time>
         <obj>1.0</obj>
         <gap>0.15</gap>
      </result>
   </run>
</results>

I have a sample code below to parse this file after reading this post How to convert an XML file to nice pandas dataframe?, but it returns None. However, my question is if there is a fast way to create a dataframe that contains an index from value of (i.e., VARS) and 4 columns i.e., status, time, obj, and gap.

import pandas as pd
from xml.etree import ElementTree as et

root = (et.parse('test.xml').getroot()).getchildren()


tags = {"tags":[]}
for elem in root:
    tag = {}
    tag["status"] = elem.attrib['status']
    tag["time"] = elem.attrib['time']
    tag["obj"] = elem.attrib['obj']
    tag["gap"] = elem.attrib['gap']
    tags["tags"]. append(tag)

df_users = pd.DataFrame(tags["tags"])
df_users.head()

This is the output I am looking for:


      status  time  obj   gap
VARS  4        3    1.0   0.15

What is etree outputting for you? We sort of don't care about the xml, we care about etree's output since that is what you are trying to make a df. — noah, Dec 22 '20 at 22:45
Also, see [How to convert an XML file to nice pandas dataframe?](https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe) — noah, Dec 22 '20 at 22:46
Your xml isn't well formed - for example, where do `` and `` close? — Jack Fleeting, Dec 22 '20 at 23:14
@noah Thanks for sharing the post. Updated my question according to that. — Alex Man, Dec 22 '20 at 23:26
Try to see why are you getting `None`. Is in that there are no `elem` in `root`? If so then it is an xml parsing issue. The code regarding pandas creation should be fast enough as is. — noah, Dec 22 '20 at 23:53
Does this answer your question? [How to convert an XML file to nice pandas dataframe?](https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe) — iacob, Apr 21 '21 at 07:54

score 1 · Answer 1 · answered Jan 05 '21 at 03:08

I think you still need to loop through etree to extract bit and pieces using xml.

import pandas as pd
from xml.etree import ElementTree as et

root = et.parse('test.xml').getroot()

results = []
for ele in eles.findall('run'):
    # assumed each run contains only one control item 
    control = ele.find('controls').find('item').text
    # extract each run result and save it in the results 
    for attr in list(ele.find('result')):
        result = {}
        result['control'] = control
        result[attr.tag] = attr.text
        results.append(result)
# at last, convert into dataframe and set control as index 
results = pd.DataFrame(results)
results = results.set_index('control')

perl · Accepted Answer · 2021-01-08T01:29:41.840

1

We can use findall and find methods of ElementTree to extract the elements that we need (children of result as columns, and controls/item as index):

pd.DataFrame({x.tag: x.text for x in et.findall('./run/result//')},
             index = [et.find('./run/controls/item').text])

Output:

     status time  obj   gap
VARS      4    3  1.0  0.15

edited Jan 08 '21 at 01:29

answered Jan 08 '21 at 01:04

perl

9,826
1
10
22

score 0 · Answer 3 · answered Jan 11 '21 at 11:40

Note that, status is not under root but you are trying to find it under root.

status is under the parent result.

You need to check recursively for status under the children.

Refer to the documentation. It gives detail on the methods with samples. findall is useful as others suggested.

Parse xml with sub-nodes and create a Pandas dataframe

3 Answers3