Basically, I want to transform and xml to json using python3 and the lxml-library. The important thing here is, that I want to preserve all text, tails, tags and the order of the xml. Below is an example of what my program should be able to do:
What I have
<root>
<tag>
Some tag-text<subtag>Some subtag-text</subtag> Some tail-text
</tag>
</root>
What I want (python dict/json)
{
"root":{
"tag":[
{"text":"Some tag-text"},
{"subtag":{"text":"Some subtag-text"}},
{"text":"Some tail-text"}
]
}
}
This is just a very simplified example. The files I need to transform are way bigger and have more nestings.
Also, I cant use the xmltodict library for this, only lxml.
Im almost 99% sure there is some elegant way to do this recursively, but so far I haven't been able to write a solution that works the way I want it to.
Thanks a lot for the help
EDIT: Why this Question is not a duplicate of Converting XML to JSON using Python?
I understand there is no such thing as a one to one mapping from xml to json. Im specifically asking for a way that preserves the text-order like in the example above.
Also, using xmltodict doesn't achieve that goal. F.eg, transforming the xml from the example above with xmltodict will result in the following structure:
root:
tag:
text: 'Some tag-text Some tail-text'
subtag: 'Some subtag-text'
you can see, that the tail part "Some tail text" was concatenated with "Some tag-text"
thanks