2

I have a dataframe which has a dictionary inside a nested list and i want to split the column 'C' :

A B     C     
1 a    [ {"id":2,"Col":{"x":3,"y":4}}]
2 b    [ {"id":5,"Col":{"x":6,"y":7}}]

expected output :

A B C_id Col_x Col_y
1 a  2    3     4 
2 b  5    6     7
Shabari nath k
  • 920
  • 1
  • 10
  • 23

3 Answers3

2

You can try .apply method

df['C_id'] = df['C'].apply(lambda x: x[0]['id'])
df['C_x'] = df['C'].apply(lambda x: x[0]['Col']['x'])
df['C_y'] = df['C'].apply(lambda x: x[0]['Col']['y'])
Aroosh Rana
  • 112
  • 11
2

From the comments, json_normalize might help you.

After extracting id and col columns with:

df[["Col", "id"]] = df["C"].apply(lambda x: pd.Series(x[0]))

You can explode the dictionary in Col with json_normalize and use concat to merge with existing dataframe:

df = pd.concat([df, json_normalize(df.Col)], axis=1)

Also, use drop to remove old columns.

Full code:

# Import modules
import pandas as pd
from pandas.io.json import json_normalize
# from flatten_json import flatten

# Create dataframe
df = pd.DataFrame([[1, "a", [ {"id":2,"Col":{"x":3,"y":4}}]],
                   [2, "b", [ {"id":5,"Col":{"x":6,"y":7}}]]],
                   columns=["A", "B", "C"])

# Add col and id column + remove old "C" column
df = pd.concat([df, df["C"].apply(lambda x: pd.Series(x[0]))], axis=1) \
        .drop("C", axis=1)
print(df)
#    A  B               Col  id
# 0  1  a  {'x': 3, 'y': 4}   2
# 1  2  b  {'x': 6, 'y': 7}   5

# Show json_normalize behavior
print(json_normalize(df.Col))
#    x  y
# 0  3  4
# 1  6  7

# Explode dict in "col" column + remove "Col" colun
df = pd.concat([df, json_normalize(df.Col)], axis=1) \
        .drop(["Col"], axis=1)
print(df)
#    A  B  id  x  y
# 0  1  a   2  3  4
# 1  2  b   5  6  7
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Alexandre B.
  • 5,387
  • 2
  • 17
  • 40
  • My first column 'A' in the dataframe is also labeled 'id' actually..would there be a key error? – Shabari nath k Mar 26 '20 at 08:32
  • No you will have 2 `id` columns. However, I strongly advice to rename the first column to another name. This [topic](https://stackoverflow.com/questions/19758364/rename-specific-columns-in-pandas) explains how. – Alexandre B. Mar 26 '20 at 08:36
  • is there a difference between df["C"] and df['C']? – Shabari nath k Mar 26 '20 at 09:09
  • I don't think so. As it is another question and has no link with the current topic, I advise you to open a new discussion for more details. – Alexandre B. Mar 26 '20 at 09:12
1

Code

import pandas as pd

A = [1, 2]
B = ['a', 'b']
C = [{"id":2,"Col":{"x":3,"y":4}}, {"id":5,"Col":{"x":6,"y":7}}]
df = pd.DataFrame({"A": A, "B": B, "C_id": [element["id"] for element in C],
               "Col_x": [element["Col"]["x"] for element in C],
               "Col_y": [element["Col"]["y"] for element in C]})

Ouput:

enter image description here

Alexandre B.
  • 5,387
  • 2
  • 17
  • 40
Saimon
  • 407
  • 2
  • 11