I need to process the content of HL7 v2.5 (OUL_R22) messages (scale: 10⁶ single messages and more) using python. To do so, I'm parsing the HL7 messages. At first I was using the python package HL7apy to convert to JSON (see Stack: HL7 to JSON conversition). The output looks good, but some errors/bugs occurred while processing and it's really slow. So I tried out the java library HAPI to convert to XML (see Stack: Converting HL7 v2 to JSON). The XML files can be read as dict using the package xmltodict. Compared to HL7apy the conversion is 50 times faster. But the structure of the output is inconsistent/heterogeneous. HAPI somehow wraps segments into new groups like OUL_R22.SPECIMEN / .ORDER / .RESULT. The question is:
Can HAPI produce a flat output or an array which has a length equal to the unique occurrences of segments of the input? Or can you just add a "keep original structure" somewhere?
To make things clearer: I need to process the content of the OBX-segment.
The input would look like this:
MSH|...
PID|...
PV1|...
SPM|...
OBR|...
ORC|...
NTE|...
NTE|...
TQ1|...
OBX|...
OBX|...
...
The structure of the output looks like this (as XML of course):
OUL_R22
MSH
OUL_R22.PATIENT
OUL_R22.VISIT
OUL_R22.SPECIMEN
OUL_R22.ORDER
OUL_R22.TIMING_QTY
...
OUL_R22.RESULT
OBX
OUL_R22.RESULT
OBX
Sometimes it's like this:
OUL_R22
...
OBX
Or like this:
OUL_R22
...
OUL_R22.SPECIMEN
OBX
This is really inconsistent.
What I want is something like this:
OUL_R22
MSH
PID
...
OBX
Or like this:
[
{
"MSH": [],
"PID": [],
...
"OBX": [],
...