0

I have a pandas dataframe which I would like to convert to JSON format for my source system to utilize, which requires a very specific JSON format.

I cant seem to get to the exact format like shown in the expected output section, using simple dictionary loops.

Is there anyway I can convert csv/pd.Dataframe to nested JSON? Any python package specifically built for this?

Input Dataframe:

 #Create Input Dataframe

data = {
        'col6':['A','A','A','B','B','B'],
        'col7':[1,  1,  2,  1,  2,  2],
        'col8':['A','A','A','B','B','B'],
        'col10':['A','A','A','B','B','B'],
        'col14':[1,1,1,1,1,2],
        'col15':[1,2,1,1,1,1],
        'col16':[9,10,26,9,12,4],
        'col18':[1,1,2,1,2,3],
        'col1':['xxxx','xxxx','xxxx','xxxx','xxxx','xxxx'],
        'col2':[2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13,2.02011E+13],
        'col3':['xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012','xxxx20201107023012'],
        'col4':['yyyy','yyyy','yyyy','yyyy','yyyy','yyyy'],
        'col5':[0,0,0,0,0,0],
        'col9':['A','A','A','B','B','B'],
        'col11':[0,0,0,0,0,0],
        'col12':[0,0,0,0,0,0],
        'col13':[0,0,0,0,0,0],
        'col17':[51,63,47,59,53,56]
        

        }
pd.DataFrame(data)

Expected Output:

{
    "header1": {
                "col1": "xxxx"
                "col2": "20201107023012"
                "col3": "xxxx20201107023012"
                "col4": "yyyy",
                "col5": "0" 
                        },
    
    "header2": 
    
    {
    "header3": 
            [
                {
                    col6: A,
                    col7: 1,
                    header4: 
                    [
                                         {
                                            col8: "A", 
                                            col9: 1, 
                                            col10: "A",
                                            col11: 0,
                                            col12: 0,
                                            col13: 0, 
                                            
                                            "header5": 
                                                [
                                                        {
                                                            col14: "1", 
                                                            col15: 1,  
                                                            col16: 1, 
                                                            col17: 51,
                                                            col18: 1 
                                                        },
                                                        
                                                        {
                                                            col14: "1", 
                                                            col15: 1,  
                                                            col16: 2, 
                                                            col17: 63,
                                                            col18: 2
                                                        }
                                                ]
                                        },
                                        {
                                            col8: "A", 
                                            col9: 1, 
                                            col10: "A",
                                            col11: 0,
                                            col12: 0,
                                            col13: 0, 
                                            
                                            "header5": 
                                                [
                                                        {
                                                            col14: "1", 
                                                            col15: 1,  
                                                            col16: 1, 
                                                            col17: 51,
                                                            col18: 1 
                                                        },
                                                        
                                                        {
                                                            col14: "1", 
                                                            col15: 1,  
                                                            col16: 2, 
                                                            col17: 63,
                                                            col18: 2
                                                        }
                                                ]
                                        }
                    ]
                }
            ]
    }
}
Bala
  • 111
  • 8

1 Answers1

0

Maybe this will get you started. I'm not aware of a current python module that will do what you want but this is the basis of how I'd start it. Making assumptions based on what you've provided.

As each successive nest is based on some criteria, you'll need to loop through filtered dataframes. Depending on the size of your dataframes using groupby may be a better option than what I have here but the theory is the same. Also, you'll have to create you key value pairs correctly, this just creates the data support what you are builing.

    # assume header 1 is constant so take first row and use .T to transpose to create dictionaries
header1 = dict(df.iloc[0].T[['col1','col2','col3','col4','col5']])
print('header1', header1)
# for header three, looks like you need the unique combinations so create dataframe 
# and then iterate through to get all the header3 dictionaries
header3_dicts = []
dfh3 = df[['col6', 'col7']].drop_duplicates().reset_index(drop=True)
for i in range(dfh3.shape[0]):
    header3_dicts.append(dict(dfh3.iloc[i].T[['col6','col7']]))
    print('header3', header3_dicts)
    # iterate over header3 to get header 4
    for i in range(dfh3.shape[0]):
        #print(dfh3.iat[i,0], dfh3.iat[i,1])
        dfh4 = df.loc[(df['col6']==dfh3.iat[i,0]) & (df['col7']==dfh3.iat[i,1])]
        header4_dicts = []
        for j in range(dfh4.shape[0]):
            header4_dicts.append(dict(df.iloc[j].T[['col8','col9','col10','col11','col12','col13']]))
        print('header4', header4_dicts)
        # next level repeat similar to above
Jonathan Leon
  • 5,440
  • 2
  • 6
  • 14