-2

I have a json response which looks something like:

d={'results':[
    {'key1':'1','key2':'item1'},
    {'key1':'1','key2':{
            'subkey20':[
                    {'subkey201':'val', 
                    'subkey202':val,
                    'subkey203':'val',
                    'subkey204':'value'},
                    {'subkey201':'val',
                    'subkey202':val,
                    'subkey203':'val',
                    'subkey204':'value'},
                    {'subkey201':'val',
                    'subkey202':val,
                    'subkey203':'val',
                    'subkey204':'value'},
                    {'subkey201':'val',
                    'subkey202':val,
                    'subkey203':'val',
                    'subkey204':'value'}]}},
    {'key1':'1','key2':'item1'},
    {'key1':'1','key2':{
            'subkey20':[
                    {'subkey201':'val', 
                    'subkey202':val,
                    'subkey203':'val',
                    'subkey204':'value'},
                    {'subkey201':'val',
                    'subkey202':val,
                    'subkey203':'val',
                    'subkey204':'value'},
                    {'subkey201':'val',
                    'subkey202':val,
                    'subkey203':'val',
                    'subkey204':'value'},
                    {'subkey201':'val',
                    'subkey202':val,
                    'subkey203':'val',
                    'subkey204':'value'}]}},]}

I am currently working on processing this into a pandas dataframe with each key starting from key1 representing a column in a pandas dataframe. So far I have been able to only do:

    df = pd.concat([pd.DataFrame(v) for k,v in d.items()], keys=d)
    print (df)

which does not yield the result i desire. Can i get assistance as to how I should navigate through this and obtain a pandas dataframe with all the keys as columns and values filled in?

I did try using json_normalize but that processes the subkeys into the key2 column as a value, not as individual columns.

Nirbhay Tandon
  • 318
  • 2
  • 13

1 Answers1

1

json_normalize() is the tool. Just use it twice along with an explode()

pd.json_normalize(pd.json_normalize(d["results"]).explode("key2.subkey20").to_dict(orient="records"))

output

key1   key2  key2.subkey20 key2.subkey20.subkey201 key2.subkey20.subkey202 key2.subkey20.subkey203 key2.subkey20.subkey204
   1  item1            NaN                     NaN                     NaN                     NaN                     NaN
   1    NaN            NaN                     val                     val                     val                   value
   1    NaN            NaN                     val                     val                     val                   value
   1    NaN            NaN                     val                     val                     val                   value
   1    NaN            NaN                     val                     val                     val                   value
   1  item1            NaN                     NaN                     NaN                     NaN                     NaN
   1    NaN            NaN                     val                     val                     val                   value
   1    NaN            NaN                     val                     val                     val                   value
   1    NaN            NaN                     val                     val                     val                   value
   1    NaN            NaN                     val                     val                     val                   value
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30