u can make use of parsel to extract the data u want; it has a simple syntax and can help with malformed xml structures :
I used xpath syntax to get the data ... have a look at w3schools xpath syntax for guidance.
summary : - if u r referencing a node, u can use /
or //
depending on the path u wish to take, and for attributes, u affix the @
symbol. To get the text from the path, attach the text()
to it and use the getall()
method to get all the values, or get()
if u r interested in just the first element.
from parsel import Selector
#if you are reading from file :
with open('data.xml') as xml:
data = xml.read()
content = Selector(text=data, type="xml")
mapping = {}
mapping["DeptCode"] = content.xpath("//DeptCode/@ID").getall()
mapping["OCC_TotalCount"] = content.xpath("//OCCCounter/TotalCount/text()").getall()
mapping["Test_app_large"] = content.xpath("//Test//app_large//text()").getall()
mapping["Test_app_small"] = content.xpath("//Test//app_small//text()").getall()
print(mapping)
{'DeptCode': ['1', '2'],
'OCC_TotalCount': ['1', '1'],
'Test_app_large': ['1', '1'],
'Test_app_small': ['2', '2']}
#create dataframe
res = pd.DataFrame(mapping)
DeptCode OCC_TotalCount Test_app_large Test_app_small
0 1 1 1 2
1 2 1 1 2