3

I have the following xml format:

<?xml version="1.0" encoding="UTF-8"?>
<results>
   <run>
      <information>
         <logfile>s.log</logfile>
         <version>33</version>
         <mach>1</mach>
         <problemname>mm1</problemname>
         <timestamp>20201218.165122.053486</timestamp>
      </information>
      <controls>
         <item>VARS</item>
      </controls>
      <result>
         <status>4</status>
         <time>3</time>
         <obj>1.0</obj>
         <gap>0.15</gap>
      </result>
   </run>
</results>

I have a sample code below to parse this file after reading this post How to convert an XML file to nice pandas dataframe?, but it returns None. However, my question is if there is a fast way to create a dataframe that contains an index from value of (i.e., VARS) and 4 columns i.e., status, time, obj, and gap.

import pandas as pd
from xml.etree import ElementTree as et

root = (et.parse('test.xml').getroot()).getchildren()


tags = {"tags":[]}
for elem in root:
    tag = {}
    tag["status"] = elem.attrib['status']
    tag["time"] = elem.attrib['time']
    tag["obj"] = elem.attrib['obj']
    tag["gap"] = elem.attrib['gap']
    tags["tags"]. append(tag)

df_users = pd.DataFrame(tags["tags"])
df_users.head()

This is the output I am looking for:


      status  time  obj   gap
VARS  4        3    1.0   0.15
Alex Man
  • 457
  • 4
  • 19
  • What is etree outputting for you? We sort of don't care about the xml, we care about etree's output since that is what you are trying to make a df. – noah Dec 22 '20 at 22:45
  • 1
    Also, see [How to convert an XML file to nice pandas dataframe?](https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe) – noah Dec 22 '20 at 22:46
  • Your xml isn't well formed - for example, where do `` and `` close? – Jack Fleeting Dec 22 '20 at 23:14
  • @JackFleeting. Thanks. Just updated that. – Alex Man Dec 22 '20 at 23:25
  • @noah Thanks for sharing the post. Updated my question according to that. – Alex Man Dec 22 '20 at 23:26
  • Try to see why are you getting `None`. Is in that there are no `elem` in `root`? If so then it is an xml parsing issue. The code regarding pandas creation should be fast enough as is. – noah Dec 22 '20 at 23:53
  • Can you use lxml instead of xml.etree? It's just simpler. – Jack Fleeting Dec 23 '20 at 00:34
  • Does this answer your question? [How to convert an XML file to nice pandas dataframe?](https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe) – iacob Apr 21 '21 at 07:54

3 Answers3

1

I think you still need to loop through etree to extract bit and pieces using xml.

import pandas as pd
from xml.etree import ElementTree as et

root = et.parse('test.xml').getroot()

results = []
for ele in eles.findall('run'):
    # assumed each run contains only one control item 
    control = ele.find('controls').find('item').text
    # extract each run result and save it in the results 
    for attr in list(ele.find('result')):
        result = {}
        result['control'] = control
        result[attr.tag] = attr.text
        results.append(result)
# at last, convert into dataframe and set control as index 
results = pd.DataFrame(results)
results = results.set_index('control')
ABC
  • 635
  • 3
  • 10
1

We can use findall and find methods of ElementTree to extract the elements that we need (children of result as columns, and controls/item as index):

pd.DataFrame({x.tag: x.text for x in et.findall('./run/result//')},
             index = [et.find('./run/controls/item').text])

Output:

     status time  obj   gap
VARS      4    3  1.0  0.15
perl
  • 9,826
  • 1
  • 10
  • 22
0

Note that, status is not under root but you are trying to find it under root.

status is under the parent result.

You need to check recursively for status under the children.

Refer to the documentation. It gives detail on the methods with samples. findall is useful as others suggested.

T.kowshik Yedida
  • 195
  • 1
  • 2
  • 13