This below is my xml data
<?xml version="1.0" ?>
<ALLRECORDS>
<RECORD>
<RecordType>Forward</RecordType>
<Env SINGLE="Y" TYPE="EntList">
<ENV>
<InputDate>20201118</InputDate>
<Company>CompanyName1</Company>
<Book>Book1</Book>
<Status>DONE</Status>
</ENV>
</Env>
<Back SINGLE="Y" TYPE="EntList">
<BACK>
<PendingUser>User1</PendingUser>
<FairValueLevel>LEVEL2</FairValueLevel>
</BACK>
</Back>
<Financial SINGLE="Y" TYPE="EntList">
<FINANCIAL>
<PorS>S</PorS>
<UserSelectedFlag>SELL</UserSelectedFlag>
<BoughtAmt TYPE="Numeric">860000000</BoughtAmt>
<Valdate>20220513</Valdate>
<Rate TYPE="Numeric">103.0575</Rate>
</FINANCIAL>
</Financial>
</RECORD>
<RECORD>
<RecordType>Forward</RecordType>
<Env SINGLE="Y" TYPE="EntList">
<ENV>
<InputDate>20201119</InputDate>
<Company>CompanyName2</Company>
<Book>Book2</Book>
<Status>DONE</Status>
</ENV>
</Env>
<Back SINGLE="Y" TYPE="EntList">
<BACK>
<PendingUser>User2</PendingUser>
<FairValueLevel>LEVEL3</FairValueLevel>
</BACK>
</Back>
<Financial SINGLE="Y" TYPE="EntList">
<FINANCIAL>
<PorS>S</PorS>
<UserSelectedFlag>SELL</UserSelectedFlag>
<BoughtAmt TYPE="Numeric">860000000</BoughtAmt>
<Valdate>20220512</Valdate>
<Rate TYPE="Numeric">103.09</Rate>
</FINANCIAL>
</Financial>
</RECORD>
</ALLRECORDS>
So, basically, this has a tag ALLRECORDS. Each Record is represented by RECORD. Where it gets interesting is, each RECORD has three sub-tags: Env, Back and Financial.
Question is: How do I create one row of data per RECORD (without worrying about column names, since tags under FINANCIAL would change).
Here is what I have done so far:
import xml.dom.minidom
import xml.etree.ElementTree as ET
import pandas as pd
from os import listdir
from os.path import isfile, join
import datetime
def get_row(record):
envrecords = record.find('Env')
env = envrecords.find('ENV')
InputDate = env.find('InputDate').text
Company = env.find('Company').text
Book = env.find('Book').text
Status = env.find('Status').text
backrecords = record.find('Back')
back = backrecords.find('BACK')
PendingUser = back.find('PendingUser').text
FairValueLevel = '' if back.find('FairValueLevel') is None else back.find('FairValueLevel').text
finrecords = record.find('Financial')
fin = finrecords.find('FINANCIAL')
PorS = fin.find('PorS').text
UserSelectedFlag = fin.find('UserSelectedFlag').text
BoughtAmt = fin.find('BoughtAmt').text
Valdate = fin.find('Valdate').text
Rate = fin.find('Rate').text
row = [InputDate, Company, Book, Status, PendingUser, FairValueLevel,
PorS, UserSelectedFlag, BoughtAmt, Valdate, Rate]
return row
def get_xml_data(xmlpath):
cols = ['InputDate', 'Company', 'Book', 'Status', 'PendingUser', 'FairValueLevel',
'PorS', 'UserSelectedFlag', 'BoughtAmt', 'Valdate', 'Rate']
rows = []
allfiles = [f for f in listdir(xmlpath) if isfile(join(xmlpath, f)) and f.endswith('xml')]
for filename in allfiles:
fullfilename = join(xmlpath, filename)
root = ET.parse(fullfilename).getroot()
tradetypes = ['RECORD']
for tt in tradetypes:
for rec in root.findall(tt):
eachrow = get_row(rec)
eachrowdict = dict(zip(cols,eachrow))
rows.append(eachrowdict)
return rows
filename = 'C:/Temp/FX/xmldata/stupidsample'
rows = get_xml_data(filename)
for a in rows:
print(a)
As you can see, I have added an exception for tag "FairValueLevel". Now, in "FINANCIAL" section, I have no idea what new column names would come in. How do I change my code say if under FINANCIAL I see a new tag called "BusinessDate"