I just started to use (=learn) Python 2.7. My current concern is focused on extracting information from XML files. So far xml.etree.ElementTree got me quite far. I am now stuck with an "KeyError". The reason - as far as I understand it - are elements with differing attributes.
Crucial part of the (much bigger) XML file:
<?xml version='1.0' encoding='utf-8' ?>
<XMLFILE>
<datasources>
<datasource caption='Sheet1 (ExcelSample)'>
<connection class='excel-direct' filename='~\SomeExcel.xlsx' .....>
......
</connection>
<column header='Unit Price' datatype='real' name='[Calculation_1]' role='measure' type='quantitative'>
<calculation class='calculation' formula='Sum(Profit)/Sum(Sales)' />
</column>
<column datatype='integer' name='[Sales]' role='measure' type='quantitative' user:auto-column='numrec'>
<calculation class='trial' formula='1' />
</column>
</datasource>
</datasources>
........
</XMLFILE>
My Python code works fine for extracting datatype and name, i.e. attributes that exist in both columns:
for cal in xmlfile.findall('datasources/datasource/column'):
dt= cal.attrib[ 'datatype' ]
nm= cal.attrib[ 'name' ]
print 'Column name:', dt, ' ', 'datatype:', nm
result:
Column name: Calculation_1, datatype:real
Column name: Sales, datatype:integer
However if I use cal.attrib['header'] Python 2.7. prints
"KeyError: 'header'
Question: How to tell Python 2.7. to produce the desired output:
Calculation "Unit Price": Sum(Profit)/Sum(Sales)
More precisely what Python should do: " for all (= if there are more than only one like in the above example) columns that contain the attribute 'header' print the output
header: Unit Price
formula: Sum(Profit)
header: Sales per day in month
formula: Sales / count(days(month))
(Note: to show a more complete desired output I added another column that's not in my example yet)
Thanks a lot for any help!