I am trying to extract elements from an XML list using the Python etree
library and to finish generating an output JSON with these elements.
The idea is to pass it a series of XPATH to extract the elements I want. I don't want to go through all the elements in the XML as there are a lot of them.
The XML looks something similar to this:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Line xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Data>
<Date>2020-01-02</Date>
<Id>id_1</Id>
<CodDevice>567</CodDevice>
<DataList>
<Item>
<Row>1</Row>
<Value>34.67</Value>
<Description>WHEELS</Description>
<Tag>tag1</Tag>
</Item>
<Item>
<Row>2</Row>
<Value>38.04</Value>
<Description>MOTOR</Description>
<Tag>tag1</Tag>
</Item>
</DataList>
<MetaList>
<Metadata>
<Row>1</Row>
<Value>some value</Value>
</Metadata>
</MetaList>
</Data>
</Line>
the approach I am considering is as follows:
import xml.etree.ElementTree as ET
import json
data = """<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Line xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Data>
<Date>2020-01-02</Date>
<Id>id_1</Id>
<CodDevice>567</CodDevice>
<DataList>
<Item>
<Row>1</Row>
<Value>34.67</Value>
<Description>WHEELS</Description>
<Tag>tag1</Tag>
</Item>
<Item>
<Row>2</Row>
<Value>38.04</Value>
<Description>MOTOR</Description>
<Tag>tag1</Tag>
</Item>
</DataList>
<MetaList>
<Metadata>
<Row>1</Row>
<Value>some value</Value>
</Metadata>
</MetaList>
</Data>
</Line>
"""
tag_list = [
'./Data/Date',
'./Data/Id',
'./Data/CodDevice',
'./Data/DataList/Item/Row',
'./Data/DataList/Item/Value',
'./Data/DataList/Item/Description',
'./Data/MetaList/Metadata/Row',
'./Data/MetaList/Metadata/Value'
]
elem_dict= {}
parser = ET.XMLParser(encoding="utf-8")
root = ET.fromstring(data, parser=parser)
for tag in tag_list:
for item in root.findall(tag):
elem_dict[item.tag] = item.text
print(json.dumps(elem_dict))
As you can see, I try to generate a JSON which, as I pass the XPATH to the list elements, overwrites them, generating the following output:
{"Date": "2020-01-02", "Id": "id_1", "CodDevice": "567", "Row": "1", "Value": "some value", "Description": "MOTOR"}
But what I would like to get is something similar to:
{"Id":"id_1","CodDevice":"567","DataList":[{"Row":1,"Value":34.67,"Description":"WHEELS"}, {"Row":2,"Value":38.04,"Description":"MOTOR"}],"MetaList":[{"Row":1,"Value":some value}]}
I don't know in detail what capabilities I can use the library for, maybe there is a more efficient way to achieve this and I am overlooking it...
Any ideas on how to approach this would be great. Thank you very much!