-2

I'm trying to parse XML to table-like structure in Python. Imagine XML like this:

<?xml version="1.0" encoding="UTF-8"?>
<base>
  <element1>element 1</element1>
  <element2>element 2</element2>
  <element3>
    <subElement3>subElement 3</subElement3>
  </element3>
</base>

I'd like to have result like this:

KEY                       | VALUE
base.element1             | "element 1"
base.element2             | "element 2"
base.element3.subElement3 | "subElement 3"

I've tried using xml.etree.cElementTree, then functions described here How to convert an xml string to a dictionary in Python?

Is there any function that can do this? All answers I found are written for particular XML schemes and would need to be edited for each new XML scheme. For reference, in R it's easy with XML and XML2 packages and xmlToList function.

Michal Hruška
  • 444
  • 2
  • 6
  • 15
  • 1
    So what have you tried, and what precisely is the problem with it? – jonrsharpe Aug 18 '17 at 11:33
  • You might be interested in https://stackoverflow.com/questions/2148119/how-to-convert-an-xml-string-to-a-dictionary-in-python. – amonowy Aug 18 '17 at 11:35
  • @jonrsharpe I tried parsing it using xml.etree.cElementTree, then functions described here https://stackoverflow.com/questions/2148119/how-to-convert-an-xml-string-to-a-dictionary-in-python and I'm just wondering whether there is some simple function similar to that one in R. I'm new to Python, don't use it normally and all tutorials that I found were written for a particular XML schema and would require editing for any other. The reason why I don't simply use R is that I believe it could be much faster with Py. – Michal Hruška Aug 18 '17 at 11:59
  • Please [edit] to give a [mcve] illustrating the specific problem. – jonrsharpe Aug 18 '17 at 12:00

1 Answers1

13

I've got the needed outcome using following script.

XML File:

<?xml version="1.0" encoding="UTF-8"?>
<base>
  <element1>element 1</element1>
  <element2>element 2</element2>
  <element3>
    <subElement3>subElement 3</subElement3>
  </element3>
</base>

Python code:

import pandas as pd
from lxml import etree

data = "C:/Path/test.xml"

tree = etree.parse(data)

lstKey = []
lstValue = []
for p in tree.iter() :
    lstKey.append(tree.getpath(p).replace("/",".")[1:])
    lstValue.append(p.text)

df = pd.DataFrame({'key' : lstKey, 'value' : lstValue})
df.sort_values('key')

Result:

Python result

Michal Hruška
  • 444
  • 2
  • 6
  • 15