3

I have a csv file with a DF with structure as follows:

my dataframe:

enter image description here

I want to enter the data to the following JSON format using python. I looked to couple of links (but I got lost in the nested part). The links I checked:

How to convert pandas dataframe to uniquely structured nested json

convert dataframe to nested json

"PHI": 2,
"firstname": "john",
"medicalHistory": {
  "allergies": "egg",
  
"event": {
    "inPatient":{
        "hospitalized": {
        "visit" : "7-20-20",
        "noofdays": "5",
         "test": {
            "modality": "xray"   
        } 
        "vitalSign": {
    "temperature": "32",
        "heartRate": "80"
  
  },
 "patientcondition": {
        "headache": "1",
        "cough": "0"
  }
        },
        "icu": {
            "visit" : "",
          "noofdays": "",
        },
    },
    "outpatient": {
        "visit":"5-20-20",
        "vitalSign": {
   "temperature": "32",
        "heartRate": "80"
  },
  "patientcondition": {
        "headache": "1",
        "cough": "1"
  },
  "test": {
            "modality": "blood"   
        }    
  }
    }

}

If anyone can help me with the nested array, that will be really helpful.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Rapa
  • 43
  • 9

1 Answers1

0

You need one or more helper functions to unpack the data in the table like this. Write main helper function to accept two arguments: 1. df and 2. schema. The schema will be used to unpack the df into a nested structure for each row in the df. The schema below is an example of how to achieve this for a subset of the logic you describe. Although not exactly what you specified in example, should be enough of hint for you to complete the rest of the task on your own.

from operator import itemgetter
groupby_idx = ['PHI', 'firstName']
groups = df.groupby(groupby_idx, as_index=False, drop=False)
schema = {
    "event": {
        "eventType": itemgetter('event'), 
        "visit": itemgetter('visit'),
        "noOfDays": itemgetter('noofdays'),
        "test": {
            "modality": itemgetter('test')
        },
        "vitalSign": {
            "temperature": itemgetter('temperature'),
            "heartRate": itemgetter('heartRate')
        },
        "patientCondition": {
            "headache": itemgetter('headache'),
            "cough": itemgetter('cough')
        }
    }
}

def unpack(obj, schema):
    tmp = {}
    for k, v in schema.items():
        if isinstance(v, (dict,)):
            tmp[k] = unpack(obj, v)
        if callable(v):
            tmp[k] = v(obj)
    return tmp

def apply_unpack(groups, schema):
    results = {}
    for gidx, df in groups:
        events = []
        for ridx, obj in df.iterrows():
            d = unpack(obj, schema)
            events.append(d)
        results[gidx] = events
    return results

unpacked = apply_unpack(groups, schema)

skullgoblet1089
  • 554
  • 4
  • 12
  • Thank you so much for your answer @skullgoblet1089 How can I save i to json. I tried "with open('data.txt', 'w') as outfile: json.dump(unpacked, outfile)" But its giving me error "TypeError: key (1, 'jane') is not a string" – Rapa Jul 27 '20 at 14:24
  • Hi @Rapa. The error you see is because I made the key of `results` returned by `apply_unpack` a tuple, which cannot be serialized with default `json` serializer used to dump py variable to .json file. If you change the key to a string value it'll work. simple example: `results[str(gidx)] = events`. Up to you what the key should be. – skullgoblet1089 Jul 27 '20 at 15:31
  • Hi @skullgoblet1089 I fixed the issue. In your solution, when I generate the data and save in the json, its coming in a single line and within one PID (if yo think it as a database perspective). How can I make it multiline (e.g., jane doe has a separate PID from John doe. ). – Rapa Aug 03 '20 at 14:57
  • PID is PHI? I think you simply change the groupby_idx to only contain `PHI` – skullgoblet1089 Aug 03 '20 at 20:28