Parsing HL7 v2 and converting to JSON/XML

Question

I need to process the content of HL7 v2.5 (OUL_R22) messages (scale: 10⁶ single messages and more) using python. To do so, I'm parsing the HL7 messages. At first I was using the python package HL7apy to convert to JSON (see Stack: HL7 to JSON conversition). The output looks good, but some errors/bugs occurred while processing and it's really slow. So I tried out the java library HAPI to convert to XML (see Stack: Converting HL7 v2 to JSON). The XML files can be read as dict using the package xmltodict. Compared to HL7apy the conversion is 50 times faster. But the structure of the output is inconsistent/heterogeneous. HAPI somehow wraps segments into new groups like OUL_R22.SPECIMEN / .ORDER / .RESULT. The question is:

Can HAPI produce a flat output or an array which has a length equal to the unique occurrences of segments of the input? Or can you just add a "keep original structure" somewhere?

To make things clearer: I need to process the content of the OBX-segment.

The input would look like this:

MSH|...
PID|...
PV1|...
SPM|...
OBR|...
ORC|...
NTE|...
NTE|...
TQ1|...
OBX|...
OBX|...
...

The structure of the output looks like this (as XML of course):

OUL_R22
    MSH
    OUL_R22.PATIENT
    OUL_R22.VISIT
    OUL_R22.SPECIMEN
        OUL_R22.ORDER
            OUL_R22.TIMING_QTY
                ...
            OUL_R22.RESULT
                OBX
            OUL_R22.RESULT
                OBX

Sometimes it's like this:

OUL_R22
    ...
    OBX

Or like this:

OUL_R22
    ...
    OUL_R22.SPECIMEN
        OBX

This is really inconsistent.

What I want is something like this:

OUL_R22
    MSH
    PID
    ...
    OBX

Or like this:

[
    {
        "MSH": [],
        "PID": [],
        ...
        "OBX": [],
        ...

You are trying to do a double conversion from a complex and primitive protocol like HL7 :-| . First of all I would suggest to add to your question real samples of the HL7 and XML parts you are interested in as well as the expected ouptupt. Second, try to carefully analyze those cases when HAPI produces unpredictable results. You could also check the 'Parsing' section here https://hapifhir.github.io/hapi-hl7v2/devbyexample.html. Perhaps you would do better posting the problem on an HL7 specific forum. — LMC, Jan 23 '18 at 14:46
It takes a lot of time to anonymize the data, so I hope, that someone who is dealing with the xml output of HAPI is getting the issue. I did check the 'parsing section' which helped to create output at all. But maybe you're right and I should address my question to an HL7 specific audience. Thanks for the advice! — tharndt, Jan 23 '18 at 15:05
Welcome. HL7 can be very tricky so it's important to track the failing cases. — LMC, Jan 23 '18 at 15:13

score 1 · Answer 1 · answered Feb 07 '18 at 21:31

Those groups are a fundamental part of the HL7 syntax - I'm perpetually baffled by Mirth and other interface engines who ignore them in their syntax for working with HL7v2.

OBR/OBX combinations have lots of groups because there is lots of nested relationships between the two. For example, NTEs following OBR apply to the OBR as whole, but NTEs following each OBX apply to that OBX individually.

Here's what this looks like in the HL7 spec.

So referencing something like OBSERVATION[0].NTE[0] gets you exactly what you want. As far as I can tell, Mirth forces you to write code to solve this problem as well as this one.

At the end of the day, solve the problem how you see fit, but deviating from the standard may come back to back bite you.

score 0 · Answer 2 · answered Feb 07 '18 at 10:07

0

Solution so far:

Setting up a Mirth Connect Server to receive and transform the HL7 v2.x messages.

answered Feb 07 '18 at 10:07

tharndt

127
3
9

Parsing HL7 v2 and converting to JSON/XML

2 Answers2