I have an xml
file of the form:
<NewDataSet>
<Root>
<Phonemic>and</Phonemic>
<Phonetic>nd</Phonetic>
<Description/>
<Start>0</Start>
<End>8262</End>
</Root>
<Root>
<Phonemic>comfortable</Phonemic>
<Phonetic>comfetebl</Phonetic>
<Description>adj</Description>
<Start>61404</Start>
<End>72624</End>
</Root>
</NewDataSet>
I need to process it so that, for instance, when the user inputs nd
, the program matches it with the <Phonetic>
tag and returns and
from the <Phonemic>
part. I thought maybe if I can convert the xml file to a dictionary, I would be able to iterate over the data and find information when needed.
I searched and found xmltodict which is used for the same purpose:
import xmltodict
with open(r'path\to\1.xml', encoding='utf-8', errors='ignore') as fd:
obj = xmltodict.parse(fd.read())
Running this gives me an ordered dict
:
>>> obj
OrderedDict([('NewDataSet', OrderedDict([('Root', [OrderedDict([('Phonemic', 'and'), ('Phonetic', 'nd'), ('Description', None), ('Start', '0'), ('End', '8262')]), OrderedDict([('Phonemic', 'comfortable'), ('Phonetic', 'comfetebl'), ('Description', 'adj'), ('Start', '61404'), ('End', '72624')])])]))])
Now this unfortunately hasn't made things simpler and I am not sure how to go about implementing the program with the new data structure. For example to access nd
I'd have to write:
obj['NewDataSet']['Root'][0]['Phonetic']
which is ridiculously complicated. I tried to make it into a regular dictionary by dict()
but as it is nested, the inner layers remain ordered and my data is so big.