0

I'm looking for a solution to convert a Pandas DataFrame with 3 subcategories to an JSON output without any lists.

This is the data-structure I got:

import pandas as pd

df = pd.DataFrame({
    'cat1': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
    'cat2': ['BB', 'BB', 'BC', 'BB', 'BB', 'BB', 'BC', 'BC'],
    'cat3': ['CC', 'CC', 'CD', 'CD', 'CD', 'CC', 'CD', 'CE'],
    'prod': ['P1', 'P2', 'P3', 'P1', 'P4', 'P1', 'P3','P6'],
    'amount': [132, 51, 12, 421, 55, 11, 123, 312]
})

And this is the desired JSON output:

{
"A": {
    "BB": {
        "CC": {
            "P1": 132,
            "P2": 51
        }
    },
    "BC": {
        "CD": {
            "P3": 12
        }
    }
},
"B": {
    "BB": {
        "CD": {
            "P1": 421,
            "P4": 55
        }
    }
},
"C": {
    "BB": {
        "CC": {
            "P1": 11
        }
    },
    "BC": {
        "CD": {
            "P3": 123
        },
        "CE": {
            "P6": 312
        }
    }
}

I had several attemts with the to_json method and tried to adapt this answer without any luck.

def f(x):
    return dict({k:v for k,v in zip(x.prod, x.amount)})

(
    df.groupby(['cat1','cat2', 'cat3'])
      .apply(f)
      .groupby(level=0)
      .apply(lambda x: x.tolist())
      .to_dict()
)

TypeError: zip argument #1 must support iteration

Sven Rojek
  • 5,476
  • 2
  • 35
  • 57

1 Answers1

1

Credit to DSM but I modified the .ix to .iloc

def recur_dictify(frame):
         if len(frame.columns) == 1:
             if frame.values.size == 1: return frame.values[0][0]
             return frame.values.squeeze()
         grouped = frame.groupby(frame.columns[0])
         d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
         return d
     
recur_dictify(df)
Out[211]: 
{'A': {'BB': {'CC': {'P1': 132, 'P2': 51}}, 'BC': {'CD': {'P3': 12}}},
 'B': {'BB': {'CD': {'P1': 421, 'P4': 55}}},
 'C': {'BB': {'CC': {'P1': 11}}, 'BC': {'CD': {'P3': 123}, 'CE': {'P6': 312}}}}
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Works like a charm. Nice recursive solution. Could you explain this solution a little bit? I got to check the squeeze function and how works g.iloc[:,1:] exactly? – Sven Rojek Sep 07 '20 at 17:59
  • 1
    @SvenRojek slice the position from 2nd row to the end – BENY Sep 07 '20 at 18:20