I want to parse data from an XML file into an multiindex pandas dataframe. My XML File looks like this:
<?xml version="1.0"?>
<catalog>
<book name="Documents/Books/German">
<author>Kerstin Gier</author>
<title>Rubinrot</title>
</book>
<book name="Documents/Articles/English">
<author>Kim Ralls</author>
<title>Midnight Rain</title>
</book>
<book name="Documents/Books/English">
<author>Eva Corets</author>
<title>Maeve Ascendant</title>
</book>
<book name="Documents/Books/English">
<author>Karl Parker</author>
<title>Worldeater</title>
</book>
</catalog>
The goal is to store the data from all book tags into an multiindex pandas dataframe which should look like follows:
author title
Documents Books German Kerstin Gier Rubinrot
English Eva Corets Maeve Ascendant
Karl Parker Worldeater
Articles German Null Null
English Kim Ralls Midnight Rain
The index of the multiindex dataframe should be the paths which the attribute "name" contains. I don't want to hardcode any paths because my real world example has many different paths and the multiindex dataframe will have 5-6 dimensions.
My approach so far: I started to create a single index dataframe which looks like so
path author title
Documents/Books/German Kerstin Gier Rubinrot
Documents/Articles/English Kim Ralls Midnight Rain
Documents/Books/English Eva Corets Maeve Ascendant
Documents/Books/English Karl Parker Worldeater
The question is: how can I convert the dataframe into a multiindex dataframe with the path structure as indexes? The problem I see is to change the indexes without loosing the binding to the data.