With MiFID 2 introduced, I would like to analyze the LEI data from GLEIF.
The data is in XML format, but boy! It is hard to parse.
I tried the code (see below), which freezes my machine almost completely and then gives this error:
AttributeError: no such child: {http://www.gleif.org/data/schema/leidata/2016}pyval.
The structure of the data is really simple, but the files are large. Nevertheless, I think the main culprit is the use of special characters, i.e. the colon "lei:"
in the tags, see this shortened example:
<lei:LEIData xmlns:gleif="http://www.gleif.org/concatenated-file/header-extension/2.0" xmlns:lei="http://www.gleif.org/data/schema/leidata/2016">
<lei:LEIRecords>
<lei:LEIRecord>
<lei:LEI>029200137F2K8AH5C573</lei:LEI>
</lei:LEIRecord>
</lei:LEIRecords>
</lei:LEIData>
Any help?
I posted a larger sample on pastebin: https://pastebin.com/UbrM5mVp after having eliminated the lei:LEIHeader
section.
See the python code below (borrowed from Wes McKinney's book, Section 6.1):
from lxml import objectify
path = '20180104-gleif-concatenated-file-lei2.xml'
data = []
parsed = objectify.parse(open(path))
root = parsed.getroot()
for child in root:
print(child.tag, child.attrib)
for elt in root.INDICATOR:
el_data = {}
for child in elt.getchildren():
el_data[child.tag] = child.pyval
data.append(el_data)
perf = pd.DataFrame(data)