I'm trying to extract some data from XML. I'm using xmltodict to load the data into a dictionary, then using list comprehensions to pull out individual parts into separate lists. I will later be plotting these using matplotlib.
XML:
<?xml version="1.0" ?>
<MYDATA>
<SESSION ID="1234">
<INFO>
<BEGIN LOAD="23"/>
</INFO>
<TRANSACTION ID="2103645570">
<ANSWER>Hello</ANSWER>
</TRANSACTION>
<TRANSACTION ID="4315547431">
<ANSWER>This is an answer</ANSWER>
</TRANSACTION>
</SESSION>
<SESSION ID="5678">
<INFO>
<BEGIN LOAD="28"/>
</INFO>
<TRANSACTION ID="4099381642">
<ANSWER>Hello</ANSWER>
</TRANSACTION>
<TRANSACTION ID="1220404184">
<ANSWER>A Different answer</ANSWER>
</TRANSACTION>
<TRANSACTION ID="201506542">
<ANSWER>Yet another one</ANSWER>
</TRANSACTION>
</SESSION>
</MYDATA>
My code:
from collections import OrderedDict
# doc contains the xml exactly as loaded by xmltodict
doc = OrderedDict([(u'MYDATA', OrderedDict([(u'SESSION', [OrderedDict([(u'@ID', u'1234'), (u'INFO', OrderedDict([(u'BEGIN', OrderedDict([(u'@LOAD', u'23')]))])), (u'TRANSACTION', [OrderedDict([(u'@ID', u'2103645570'), (u'ANSWER', u'Hello')]), OrderedDict([(u'@ID', u'4315547431'), (u'ANSWER', u'This is an answer')])])]), OrderedDict([(u'@ID', u'5678'), (u'INFO', OrderedDict([(u'BEGIN', OrderedDict([(u'@LOAD', u'28')]))])), (u'TRANSACTION', [OrderedDict([(u'@ID', u'4099381642'), (u'ANSWER', u'Hello')]), OrderedDict([(u'@ID', u'1220404184'), (u'ANSWER', u'A Different answer')]), OrderedDict([(u'@ID', u'201506542'), (u'ANSWER', u'Yet another one')])])])])]))])
sess_ids = [i['@ID'] for i in doc['MYDATA']['SESSION']]
print sess_ids
sess_loads = [i['INFO']['BEGIN']['@LOAD'] for i in doc['MYDATA']['SESSION']]
print sess_loads
trans_ids = [[j['@ID'] for j in i['TRANSACTION']] for i in doc['MYDATA']['SESSION']]
print trans_ids
Output:
sess_ids: [u'1234', u'5678']
sess_loads: [u'23', u'28']
trans_ids: [[u'2103645570', u'4315547431'], [u'4099381642', u'1220404184', u'201506542']]
You can see that I'm able to access the ID attributes from the SESSION elements and also the LOAD attributes from the BEGIN elements.
I need to get the ID attributes from the TRANSACTION elements as a single list. Currently I'm getting a list of lists in variable trans_ids
.
How can I get just a flat list of the values?
I have tried:
[j['@ID'] for j in i['TRANSACTION'] for i in doc['MYDATA']['SESSION']]
but that just repeats the second session twice, giving:
[u'4099381642',
u'4099381642',
u'1220404184',
u'1220404184',
u'201506542',
u'201506542']