I have this input.xml example:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<bathrooms>
<n35237 type="number">1.0</n35237>
<n32238 type="number">3.0</n32238>
<n44699 type="number">nan</n44699>
</bathrooms>
<price>
<n35237 type="number">7020000.0</n35237>
<n32238 type="number">10000000.0</n32238>
<n44699 type="number">4128000.0</n44699>
</price>
<property_id>
<n35237 type="number">35237.0</n35237>
<n32238 type="number">32238.0</n32238>
<n44699 type="number">44699.0</n44699>
</property_id>
</root>
that I would like to analyse as a dataframe. The code I used for this is below:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml')
root = tree.getroot()
def f(elem, result):
result[elem.tag] = elem.text
cs = list(elem)
for c in cs:
result = f(c, result)
return result
d = f(root, {})
df=pd.DataFrame(d.items())
print(df)
The thing is that the dataframe is nowhere as the xml file. It shows only the last nodes of the xml, because the nodes' names repeat themselves. How can I view all the xml nodes and their correspondent values, without needing to specify the nodes' names? (so that this can be done for any custom xml)