1

Objective: convert pandas dataframe to an aggregated json-like object.

The "json-like" object contains an aggregate (sum) of the values for each Group and Category as weights.

Current state:

df = pd.DataFrame({'group': ["Group 1", "Group 1", "Group 2", "Group 3", "Group 3", "Group 3"], 
                   'category': ["Category 1.1", "Category 1.2", "Category 2.1", "Category 3.1", "Category 3.2", "Category 3.3"],
                   'value': [2, 4, 5, 1, 4, 5]
                   })

Structure:

>>> df[['group','category','value']]
     group      category  value
0  Group 1  Category 1.1      2
1  Group 1  Category 1.2      4
2  Group 2  Category 2.1      5
3  Group 3  Category 3.1      1
4  Group 3  Category 3.2      4
5  Group 3  Category 3.3      5

Desired output:

{"groups": [
    {"label": "Group 1",
      "weight": 6,
      "groups": [
        {"label": "Category 1.1",
          "weight": 2,
          "groups": [] },
        {"label": "Category 1.2",
          "weight": 4,
          "groups": [] }
      ] },
    {"label": "Group 2",
      "weight": 5,
      "groups": [{
          "label": "Category 2.1",
          "weight": 5,
          "groups": []
        } ] },
    {"label": "Group 3",
      "weight": 10,
      "groups": [{
          "label": "Category 3.1",
          "weight": 1,
          "groups": []
        },
        {"label": "Category 3.2",
          "weight": 4,
          "groups": []
        },
        {"label": "Category 3.3",
          "weight": 5,
          "groups": []
        } ]
    } ]
}

Tried so far:

pd.pivot_table(df, index=['group'],columns=['category'], values=['value'],aggfunc=np.sum, margins=True).stack('category')

Pivot output:

                      value
group   category           
Group 1 All             6.0
        Category 1.1    2.0
        Category 1.2    4.0
Group 2 All             5.0
        Category 2.1    5.0
Group 3 All            10.0
        Category 3.1    1.0
        Category 3.2    4.0
        Category 3.3    5.0
All     All            21.0
        Category 1.1    2.0
        Category 1.2    4.0
        Category 2.1    5.0
        Category 3.1    1.0
        Category 3.2    4.0
        Category 3.3    5.0

From there I'm stuck. The rollup "All" seems like it should be in another column and I don't want it as a "group". I've tried using to_json() with various iterations of record, values, split as args, but I can't figure out how to render the desired output.

Also tried df.groupby(['group','category']).agg({'value':'sum'}), but I don't get the rollup sum.

Questions similar, but not quite the structure I'm seeking:

Minnow
  • 1,733
  • 2
  • 26
  • 52

1 Answers1

1

I think the below might work for you. Can't say it's pretty tho...

import numpy as np
import pandas as pd
from itertools import chain
import json

df_grouped = df.groupby(['group', 'category'])['value'].sum().reset_index()
df_grouped = df_grouped.rename(columns={'value': 'weight', 'category': 'label'})

output_object = \
    [{'label': k, 
      'weight': df_grouped.loc[v, 'weight'].sum(),
      'groups': [dict({'groups': ()}.items() | x.items()) for x in 
                 chain.from_iterable(df_grouped.iloc[v, :].groupby('label')[['label', 'weight']].\
                  apply(lambda x: x.to_dict(orient='records')).tolist())]}
      for (k, v) in df_grouped.groupby(['group'])[['label', 'weight']].groups.items()]

output_dict = {'groups': output_object}

print(output_dict)

{'groups': [{'groups': [{'groups': (), 'label': 'Category 2.1', 'weight': 5}],
   'label': 'Group 2',
   'weight': 5},
  {'groups': [{'groups': (), 'label': 'Category 1.1', 'weight': 2},
    {'groups': (), 'label': 'Category 1.2', 'weight': 4}],
   'label': 'Group 1',
   'weight': 6},
  {'groups': [{'groups': (), 'label': 'Category 3.1', 'weight': 1},
    {'groups': (), 'label': 'Category 3.2', 'weight': 4},
    {'groups': (), 'label': 'Category 3.3', 'weight': 5}],
   'label': 'Group 3',
   'weight': 10}]}

To actually get it in JSON form, I grabbed the solution from this answer:

def default(o):
    if isinstance(o, np.integer): return int(o)
    raise TypeError

output_json = json.dumps(output_json, default=default)

print(output_json)

'{"groups": [{"groups": [{"groups": [], "weight": 5, "label": "Category 2.1"}], "weight": 5, "label": "Group 2"}, {"groups": [{"groups": [], "weight": 2, "label": "Category 1.1"}, {"groups": [], "weight": 4, "label": "Category 1.2"}], "weight": 6, "label": "Group 1"}, {"groups": [{"groups": [], "weight": 1, "label": "Category 3.1"}, {"groups": [], "weight": 4, "label": "Category 3.2"}, {"groups": [], "weight": 5, "label": "Category 3.3"}], "weight": 10, "label": "Group 3"}]}'
Sociopath
  • 13,068
  • 19
  • 47
  • 75
Ido S
  • 1,304
  • 10
  • 11
  • I appreciate the efforts, but the structure isn't quite the same. e.g. `{"groups": [ {"label": "Group 1", "weight": 6, "groups": [` vs. `{"groups": [{"groups": [{"groups": []...` – Minnow Mar 29 '18 at 12:19