0

I am trying to create a nested JSON block and came accross this awesome solution Pandas grouping by multiple columns to get a multi nested Json:

test = [df.groupby('cat_a')\
          .apply(lambda x: x.groupby('cat_b')\
                            .apply(lambda x: [x.groupby('cat_c')
                                               .apply(lambda x: x[['participants_actual','participants_registered']].to_dict('r')
                                                      ).to_dict()]
                                  ).to_dict()
          ).to_dict()]

import json
json_res = list(map(json.dumps, test))

This works well for my usecase. However, as I cannot control the dataframe in all cases, there may be more than the three levels noted here.

I could easily imagine getting the levels as follows:

for c in cols[:-2]:
   .... perform level gropuping

However, as each of the lamba and apply functions feeds into the level above, I am not sure how I could write such a statement in a for loop.

Is there a path for make this statement more dynamic?

NickP
  • 1,354
  • 1
  • 21
  • 51

2 Answers2

1

Did you try making it recursive? Something like below

def myfunc(df):
   if len(df.columns.values) > 1 :
      return [df.groupby[df.columns.values[0]].apply(myfunc)].to_dict()
   else :
      return [df].to_dict()
1

Set the dataframe's index to the columns that you want to nest by, and group by all of these at once. Make a helper function that will set a nested dictionary according to a sequence of keys, and apply this to the keys for each group. group.to_dict("records") provides the list of dictionaries for the remaining columns within each nested dictionary.

def set_nested_dict_by_path(d, path, value):
    for key in path[:-1]:
        d = d.setdefault(key, {})
    d[path[-1]] = value
    
def to_nested(df):
    r = {}
    for keys, group in df.groupby(df.index):
        set_nested_dict_by_path(r, keys, group.to_dict("records"))
    return r

df = pd.DataFrame({"col1": [1, 1, 1, 1, 2, 2, 2], "col2": [3, 3, 4, 4, 5, 5, 6], "col3": [7, 8, 9, 10, 11, 12, 13], "col4": [14, 15, 16, 17, 18, 19, 20]})
print(to_nested(df.set_index(["col1", "col2"])))
    
Stuart
  • 9,597
  • 1
  • 21
  • 30