I'm trying to create a script to convert nested XML files to a Pandas dataframe. I've found this article https://medium.com/@robertopreste/from-xml-to-pandas-dataframes-9292980b1c1c, which does a good job getting to the second level (parent, child) but I don't know neither how to get to deeper levels (e.g. grandchildren), nor to get to the attributes of the children (e.g. "neighbor" -> "name").
Here is my XML structure:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
<neighbor2 name="Italy" direction="S"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
and here is my code:
import pandas as pd
import xml.etree.ElementTree as et
def parse_XML(xml_file, df_cols):
xtree = et.parse(xml_file)
xroot = xtree.getroot()
rows = []
for node in xroot:
res = []
res.append(node.attrib.get(df_cols[0]))
for el in df_cols[1:]:
if node is not None and node.find(el) is not None:
res.append(node.find(el).text)
else:
res.append(None)
rows.append({df_cols[i]: res[i]
for i, _ in enumerate(df_cols)})
out_df = pd.DataFrame(rows, columns=df_cols)
return out_df
xml_file= "example.xml"
df_cols = ["name","year","direction"]
out_df=parse_XML(xml_file, df_cols)
out_df
What I'd like to obtain is a structure like the following:
| name | year | neighbor name 1 | neighbor direction 1 | neighbor2 name 1 |
|---------------|------|-----------------|----------------------|------------------|
| Liechtenstein | 2008 | Austria | E | Italy |
| | | | | |
| | | | | |
The structure needs to be as flexible as possible, so that it would require little editing to be used with different files. I'm getting XML files with different data structures, so I'd like to be able to do some minimal editing every time.
Thanks a lot!!