Converting Dataframe to Dict and Sending to API

Question

I have a dataframe that I need to convert to a dict and send via API as Json.

This is my df:

Category	Topic	Steps	Stud1	Stud2	Stud3
Cat1	topc1	step1	10	15	30
Cat1	topc2	step2	16	26	42
Cat3	topc3	step3	05	62	50

I want to generate dict something like this:

{Cat1: {
        topc1:{
               step1:
                     [
                       {Stud1:10},
                       {Stud2:15},
                       {Stud3:30}
                     ]
               }
        }
        topc2:{
               step2:
                     [
                       {Stud1:10},
                       {Stud2:15},
                       {Stud3:30}
                     ]
               }
        }
}

I know I can achieve this with for loop and iterrows() but that would be slow result.

I have tried using to_dict() function from pandas with group_by() on Category, Topic, Step but that doesn't give required output. I have tried many ways like stack() and I even merging the rows of same name but it yields no result. Can anyone tell me best efficient way to achieve this.

Have you checked [this](https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary)? — Ersel Er, Aug 03 '22 at 07:51
@ErselEr Yes, I tried all orients from `to_dict()`. But the result I want is too complex for `to_dict()` predefined methods. have to resolve with custom methods only. — Sahil Mohile, Aug 03 '22 at 07:56
Could there be duplicate `Topic`-`Steps` for a single `Category`? For example, could the `Topic` value in the 2. row be equal to `topic1` and the `Step`-value in the 2. row be equal to `step1` at the same time? — Timus, Aug 03 '22 at 08:04
@Timus Yes, In such case relation should be `topic: step1, step2` — Sahil Mohile, Aug 03 '22 at 08:05

Timus · Accepted Answer · 2022-08-03T09:20:02.013

I'm not completely sure I understand the question fully. But you could try this:

result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
    inner = result.setdefault(key1, {}).setdefault(key2, {})
    inner[key3] = [
        {key: value}
        for record in sdf[["Stud1", "Stud2", "Stud3"]].to_dict(orient="records")
        for key, value in record.items()
    ]

Result for your example df:

{'Cat1': {'topc1': {'step1': [{'Stud1': 10}, {'Stud2': 15}, {'Stud3': 30}]},
          'topc2': {'step2': [{'Stud1': 16}, {'Stud2': 26}, {'Stud3': 42}]}},
 'Cat3': {'topc3': {'step3': [{'Stud1': 5}, {'Stud2': 62}, {'Stud3': 50}]}}}

This, more direct, variation is bit faster, but not much:

result = {}
for (key1, key2, key3), sdf in df.groupby(["Category", "Topic", "Steps"]):
    inner = result.setdefault(key1, {}).setdefault(key2, {})
    inner[key3] = [
        {key: value}
        for values in zip(sdf["Stud1"], sdf["Stud2"], sdf["Stud3"])
        for key, value in zip(("Stud1", "Stud2", "Stud3"), values)
    ]

Thank you. This method works. Just a question though is there any way we can speed up process. As doing this loop on my dataset is quite slow. — Sahil Mohile, Aug 03 '22 at 08:33
@SahilMohile I've experimented a bit but couldn't come up with a major performance improvement -- sorry :( — Timus, Aug 03 '22 at 09:16

score 1 · Answer 2 · answered Mar 02 '23 at 05:35

def function1(ss:pd.Series):
    return str({k:v for d in ss.tolist() for k,v in d.items()})

df.groupby(['Category','Topic','Steps']).apply(lambda dd:dd[['Stud1','Stud2','Stud3']].to_dict('r'))\
    .reset_index(level=2).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)\
    .reset_index(level=1).apply(lambda ss:{ss.iloc[0]:ss.iloc[1]},axis=1)\
    .groupby(level=0).apply(function1).map(eval).to_dict()

out：

{'Cat1': {'topc1': {'step1': [{'Stud1': 10, 'Stud2': 15, 'Stud3': 30}]},
  'topc2': {'step2': [{'Stud1': 16, 'Stud2': 26, 'Stud3': 42}]}},
 'Cat3': {'topc3': {'step3': [{'Stud1': 5, 'Stud2': 62, 'Stud3': 50}]}}}

Converting Dataframe to Dict and Sending to API

2 Answers2