How to break/pop a nested Dictionary inside a list, inside a pandas dataframe?

Question

I have a dataframe which has a dictionary inside a nested list and i want to split the column 'C' :

A B     C     
1 a    [ {"id":2,"Col":{"x":3,"y":4}}]
2 b    [ {"id":5,"Col":{"x":6,"y":7}}]

expected output :

A B C_id Col_x Col_y
1 a  2    3     4 
2 b  5    6     7

do you have 1 nested dictionary inside the list always? – anky Mar 26 '20 at 08:20 — anky, Mar 26 '20 at 08:20

score 2 · Answer 1 · answered Mar 26 '20 at 08:03

2

You can try .apply method

df['C_id'] = df['C'].apply(lambda x: x[0]['id'])
df['C_x'] = df['C'].apply(lambda x: x[0]['Col']['x'])
df['C_y'] = df['C'].apply(lambda x: x[0]['Col']['y'])

answered Mar 26 '20 at 08:03

Aroosh Rana

112
11

how do i do this in a loop?, cause my column C has a lot of entries actualy – Shabari nath k Mar 26 '20 at 08:06
How's the repetition, is it like Id or like Col? – Aroosh Rana Mar 26 '20 at 08:10
More helpful, if you could provide a snapshot of your dataset! – Aroosh Rana Mar 26 '20 at 08:31

score 2 · Accepted Answer · edited Apr 03 '20 at 16:09

From the comments, json_normalize might help you.

After extracting id and col columns with:

df[["Col", "id"]] = df["C"].apply(lambda x: pd.Series(x[0]))

You can explode the dictionary in Col with json_normalize and use concat to merge with existing dataframe:

df = pd.concat([df, json_normalize(df.Col)], axis=1)

Also, use drop to remove old columns.

Full code:

# Import modules
import pandas as pd
from pandas.io.json import json_normalize
# from flatten_json import flatten

# Create dataframe
df = pd.DataFrame([[1, "a", [ {"id":2,"Col":{"x":3,"y":4}}]],
                   [2, "b", [ {"id":5,"Col":{"x":6,"y":7}}]]],
                   columns=["A", "B", "C"])

# Add col and id column + remove old "C" column
df = pd.concat([df, df["C"].apply(lambda x: pd.Series(x[0]))], axis=1) \
        .drop("C", axis=1)
print(df)
#    A  B               Col  id
# 0  1  a  {'x': 3, 'y': 4}   2
# 1  2  b  {'x': 6, 'y': 7}   5

# Show json_normalize behavior
print(json_normalize(df.Col))
#    x  y
# 0  3  4
# 1  6  7

# Explode dict in "col" column + remove "Col" colun
df = pd.concat([df, json_normalize(df.Col)], axis=1) \
        .drop(["Col"], axis=1)
print(df)
#    A  B  id  x  y
# 0  1  a   2  3  4
# 1  2  b   5  6  7

My first column 'A' in the dataframe is also labeled 'id' actually..would there be a key error? — Shabari nath k, Mar 26 '20 at 08:32
No you will have 2 `id` columns. However, I strongly advice to rename the first column to another name. This [topic](https://stackoverflow.com/questions/19758364/rename-specific-columns-in-pandas) explains how. — Alexandre B., Mar 26 '20 at 08:36
I don't think so. As it is another question and has no link with the current topic, I advise you to open a new discussion for more details. — Alexandre B., Mar 26 '20 at 09:12

score 1 · Answer 3 · edited Mar 26 '20 at 08:07

1

Code

import pandas as pd

A = [1, 2]
B = ['a', 'b']
C = [{"id":2,"Col":{"x":3,"y":4}}, {"id":5,"Col":{"x":6,"y":7}}]
df = pd.DataFrame({"A": A, "B": B, "C_id": [element["id"] for element in C],
               "Col_x": [element["Col"]["x"] for element in C],
               "Col_y": [element["Col"]["y"] for element in C]})

Ouput:

edited Mar 26 '20 at 08:07

Alexandre B.

5,387
2
17
40

answered Mar 26 '20 at 08:05

Saimon

407
2
11

How to break/pop a nested Dictionary inside a list, inside a pandas dataframe?

3 Answers3