4

I want to create my dataframe which looks like this:

    employeeId  firstName   lastName    emailAddress    isDependent employeeIdTypeCode  entityCode  sourceCode  roleCode
0   E123456 Andrew  Hoover  hoovera@xyz.com False   001 AE  AHR EMPLR
0   102939485   Andrew  Hoover  hoovera@xyz.com False   002 AE  AHR EMPLR
2   E123458 Celeste Riddick riddickc@xyz.com    True    001 AE  AHR EMPLR
2   354852739   Celeste Riddick riddickc@xyz.com    True    002 AE  AHR EMPLR
1   E123457 Curt    Austin  austinc1@xyz.com    True    001 AE  AHR EMPLR
1   675849302   Curt    Austin  austinc1@xyz.com    True    002 AE  AHR EMPLR
3   E123459 Hazel   Tooley  tooleyh@xyz.com False   001 AE  AHR EMPLR
3   937463528   Hazel   Tooley  tooleyh@xyz.com False   002 AE  AHR EMPLR

And for each row, I want to convert it into a nested JSON format. So I want my JSON to look something like this for each individual, since I want to iterate over the dataframe and post it to an api.

{  
   "individualInfo":  
      {  
         "individualIdentifier":[  
            {  
               "identityTypeCode":"001",
               "identifierValue":"E123456",
               "profileInfo":{  
                  "firstName":"Andrew",
                  "lastName":"Hoover",
                  "emailAddress":"hoovera@xyz.com"
               }
            },
            {  
               "identityTypeCode":"002",
               "identifierValue":"102939485",
               "profileInfo":{  
                   "firstName":"Andrew",
                  "lastName":"Hoover",
                  "emailAddress":"hoovera@xyz.com"
               }
            }
         ],
         "entityCode":"AE",
         "sourceCode":"AHR",
         "roleCode":"EMPLR"
         "isDependent":False
      }
} 

The important thing here is that I want my JSON to be generated agnostic of the Id columns coming on the dataframe. So, if there is, for example, another ID coming on the dataframe, then I want that ID to have another dictionary object with the same profile info. So each profile can have any number of Id with it.

Code I could do:

j = (result.groupby(['identifierValue','identityTypeCode'], as_index=False).apply(lambda x: x[['firstName','lastName','emailAddress']].to_dict('r')).reset_index().rename(columns={0:'ProfileInfo'}).to_json(orient='records'))

Would it be possible to achieve something like this dynmically in pandas? Thank you so much for the help!

Few of other questions that I could find for nesting:

Convert Pandas Dataframe to nested JSON

pandas groupby to nested json

None of these questions are helping me out since I want each index of my dataframe to be converted into an individual JSON payload, as each individual is going to an api service I have for the purpose of posting the data to the database.

Manas Jani
  • 699
  • 2
  • 11
  • 33
  • Assume you saw this? https://stackoverflow.com/questions/40470954/convert-pandas-dataframe-to-nested-json – Evan Dec 11 '18 at 19:44
  • Yes, I did check it and also used that code. But my data for the grouper is not 1-dimensional. Also, the json could change if the number of `(identifierValue, identityTypeCode)` changes. So I want it dynamic that way! – Manas Jani Dec 11 '18 at 20:48
  • Sadly no - I think one (I) could write some cute recursive functions to do this, but `read_json` and `to_json` have not worked the way I need in my own applications. There are lot of design choices to make along the way, and each branch in that tree takes a while to explore. – Evan Dec 12 '18 at 01:59
  • I updated my dataframe to a better group by clause, but I am still stuck at what I should do. Can someone please help me out with it? – Manas Jani Dec 13 '18 at 14:16
  • I can try again in a bit; meanwhile, maybe repost your question with lots of links to other JSON posts, and explain why yours is different? – Evan Dec 13 '18 at 14:23
  • Updated! I am trying it since the past few days but I am not able to get it to the right format at all. – Manas Jani Dec 13 '18 at 14:29
  • Any luck @Evan ? – Manas Jani Dec 13 '18 at 20:57
  • It seems like you're trying to merge duplicated profiles from (at least) two different systems. That assumption makes me struggle to understand a couple of things. 1) How do you know exactly whom to put in a single JSON payload? Is it `firstName`, 'lastName' and/or `email`? 2) Are `entityCode`, `sourceCode`, `roleCode` and `isDependent` always the same for an individual across the systems? – ayorgo Dec 15 '18 at 21:18
  • Yes, those can be the same! The only thing I need to take care of is to generate as many JSONs in the `entityIdentifier` list as there are `IDs` in the dataframe. – Manas Jani Dec 16 '18 at 00:44
  • https://codereview.stackexchange.com/questions/150531/adding-labels-and-fields-to-a-nested-json take a look at this maybe it'll be helpful – Jeredriq Demas Dec 21 '18 at 13:03

3 Answers3

5

It sounds like the most sensible way to pull this off is:

info_dict = df.set_index(['identifierValue', 'identifierValue']).to_dict('index')

Then every time you get to profileInfo in your JSON, you can reference the info_dict above with the appropriate ('identifierValue', 'identifierValue')` key pair

I'm confused about what your desired formatting is, but this is a start.

Brian
  • 1,572
  • 9
  • 18
  • I also tried it that way but I am not able to group it that way, since my data does not have it in the 1-dimensional format. – Manas Jani Dec 11 '18 at 20:49
0

Not really a Pandas solution but kinds works:

Starts from your result dataframe

from collections import defaultdict
import json

result = 'your data frame'

dicted = defaultdict(dict)
for r in result.values.tolist():
    identifierValue, firstName, lastName, emailAddress,isDependent,\
    identityTypeCode, entityCode, sourceCode,roleCode = r
    tupled_criteria = (firstName,lastName,emailAddress)
    if dicted[tupled_criteria].get("individualInfo"):
        pass
    else:
        dicted[tupled_criteria]["individualInfo"] = {}

    dicted[tupled_criteria]["individualInfo"]['entityCode'] = entityCode
    dicted[tupled_criteria]["individualInfo"]['soruceCode'] = sourceCode
    dicted[tupled_criteria]["individualInfo"]['roleCode'] = roleCode
    dicted[tupled_criteria]["individualInfo"]['isDependent'] = isDependent
    if dicted[tupled_criteria]["individualInfo"].get("individualIdentifier"):
        pass
    else:
        dicted[tupled_criteria]["individualInfo"]["individualIdentifier"] = []
    dicted[tupled_criteria]["individualInfo"]["individualIdentifier"]\
        .append({"identityTypeCode":identityTypeCode,
                   "identifierValue":identifierValue,
                    "profileInfo":{  
                      "firstName":firstName,
                      "lastName":lastName,
                      "emailAddress":emailAddress}})

for k,v in dicted.items():
    print(k,'\n',json.dumps(v),'\n\n')
fcsr
  • 921
  • 10
  • 17
0

Perhaps you can iterate over a group by, then do another iteration for each row within that group. Thus, creating a nested dictionary structure:

This explains one way going through with it:

import pandas as pd
df = pd.DataFrame({"entityCode":[1,1,3,3],"sourceCode":[4,4,6,6],'identityTypeCode':[7,8,9,10]})
results = []
for i, sub_df in df.groupby(["entityCode","sourceCode"]):
    entityCode, sourceCode = i
    d = {}
    d["individualInfo"] = {"entityCode":entityCode, "sourceCode":sourceCode}
    sub_result = []
    for _, row in sub_df[["identityTypeCode"]].drop_duplicates().iterrows():
        sub_result.append(row.to_dict())
    d["individualIdentifier"] = sub_result
    results.append(d)
results

which returns something like this:

[{'individualInfo': {'entityCode': 1, 'sourceCode': 4},
  'individualIdentifier': [{'identityTypeCode': 7}, {'identityTypeCode': 8}]},
 {'individualInfo': {'entityCode': 3, 'sourceCode': 6},
  'individualIdentifier': [{'identityTypeCode': 9}, {'identityTypeCode': 10}]}]

afterwards, you can convert the dictionary to json.

erncyp
  • 1,649
  • 21
  • 23