0

I have a dataframe that I need to convert to a dict and send via API as Json.

This is my df:

Category Topic Steps Stud1 Stud2 Stud3
Cat1 topc1 step1 10 15 30
Cat1 topc2 step2 16 26 42
Cat3 topc3 step3 05 62 50

I want to generate dict something like this:

{Cat1: {
        topc1:{
               step1:
                     [
                       {Stud1:10},
                       {Stud2:15},
                       {Stud3:30}
                     ]
               }
        }
        topc2:{
               step2:
                     [
                       {Stud1:10},
                       {Stud2:15},
                       {Stud3:30}
                     ]
               }
        }
}

I know I can achieve this with for loop and iterrows() but that would be slow result.

I have tried using to_dict() function from pandas with group_by() on Category, Topic, Step but that doesn't give required output. I have tried many ways like stack() and I even merging the rows of same name but it yields no result. Can anyone tell me best efficient way to achieve this.

Timus
  • 10,974
  • 5
  • 14
  • 28
Sahil Mohile
  • 99
  • 1
  • 9
  • Have you checked [this](https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary)? – Ersel Er Aug 03 '22 at 07:51
  • @ErselEr Yes, I tried all orients from `to_dict()`. But the result I want is too complex for `to_dict()` predefined methods. have to resolve with custom methods only. – Sahil Mohile Aug 03 '22 at 07:56
  • Could there be duplicate `Topic`-`Steps` for a single `Category`? For example, could the `Topic` value in the 2. row be equal to `topic1` and the `Step`-value in the 2. row be equal to `step1` at the same time? – Timus Aug 03 '22 at 08:04
  • @Timus Yes, In such case relation should be `topic: step1, step2` – Sahil Mohile Aug 03 '22 at 08:05

2 Answers2

1

I'm not completely sure I understand the question fully. But you could try this:

result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
    inner = result.setdefault(key1, {}).setdefault(key2, {})
    inner[key3] = [
        {key: value}
        for record in sdf[["Stud1", "Stud2", "Stud3"]].to_dict(orient="records")
        for key, value in record.items()
    ]

Result for your example df:

{'Cat1': {'topc1': {'step1': [{'Stud1': 10}, {'Stud2': 15}, {'Stud3': 30}]},
          'topc2': {'step2': [{'Stud1': 16}, {'Stud2': 26}, {'Stud3': 42}]}},
 'Cat3': {'topc3': {'step3': [{'Stud1': 5}, {'Stud2': 62}, {'Stud3': 50}]}}}

This, more direct, variation is bit faster, but not much:

result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
    inner = result.setdefault(key1, {}).setdefault(key2, {})
    inner[key3] = [
        {key: value}
        for values in zip(sdf["Stud1"], sdf["Stud2"], sdf["Stud3"])
        for key, value in zip(("Stud1", "Stud2", "Stud3"), values)
    ]
Timus
  • 10,974
  • 5
  • 14
  • 28
1
def function1(ss:pd.Series):
    return str({k:v for d in ss.tolist() for k,v in d.items()})

df.groupby(['Category','Topic','Steps']).apply(lambda dd:dd[['Stud1','Stud2','Stud3']].to_dict('r'))\
    .reset_index(level=2).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)\
    .reset_index(level=1).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)\
    .groupby(level=0).apply(function1).map(eval).to_dict()

out:

{'Cat1': {'topc1': {'step1': [{'Stud1': 10, 'Stud2': 15, 'Stud3': 30}]},
  'topc2': {'step2': [{'Stud1': 16, 'Stud2': 26, 'Stud3': 42}]}},
 'Cat3': {'topc3': {'step3': [{'Stud1': 5, 'Stud2': 62, 'Stud3': 50}]}}}
G.G
  • 639
  • 1
  • 5