Convert dataframe cell containing dictionary to dataframe with new columns pandas

Question

Here is the data frame. Which contain some cells having a dictionary in them. I want to convert the dictionary items to columns

dfx={'name':['Alex','Jin',np.nan,'Peter'],
     'age':[np.nan,10,12,13],
     'other':[{'school':'abc','subject':'xyz'},
     np.nan,
     {'school':'abc','subject':'xyz'},
     np.nan,]
     }

dfx=pd.DataFrame(dfx)

Output

name    age        other
Alex             {'school': 'abc', 'subject': 'xyz'}
Jin     10.0    
        12.0     {'school': 'abc', 'subject': 'xyz'}
Peter   13.0

Here is the Desired output

name    age      school    subject
Alex             abc         xyz
Jin     10.0        
        12.0     abc         xyz
Peter   13.0

wont be very fast, so if your frame is large I do not recommend this but you could do `dfx.join(dfx['other'].apply(pd.Series)).drop(columns=['other', 0])` — It_is_Chris, Oct 08 '20 at 20:39
i am trying this way `pd.DataFrame(list(dfx['other'].dropna()))` but it reset the index, making it impossible to put back — Talha Anwar, Oct 08 '20 at 20:40

score 2 · Answer 1 · answered Oct 08 '20 at 20:48

You can use the .str.get accessor to actually index into the dictionaries in your columns. This also returns nan whenever the cell value is nan instead of a dictionary:

clean_df = (dfx
            .assign(
               school=lambda df: df["other"].str.get("school"),
               subject=lambda df: df["other"].str.get("subject"))
            .drop("other", axis=1))

print(clean_df)
    name   age school subject
0   Alex   NaN    abc     xyz
1    Jin  10.0    NaN     NaN
2    NaN  12.0    abc     xyz
3  Peter  13.0    NaN     NaN

Andy L. · Answer 2 · 2020-10-08T21:09:27.267

2

Try this

df_final = dfx[['name','age']].assign(**pd.DataFrame(dfx.other.to_dict()).T)

Out[41]:
    name   age school subject
0   Alex   NaN    abc     xyz
1    Jin  10.0    NaN     NaN
2    NaN  12.0    abc     xyz
3  Peter  13.0    NaN     NaN

edited Oct 08 '20 at 21:09

answered Oct 08 '20 at 20:49

Andy L.

24,909
4
17
29

score 1 · Answer 3 · answered Oct 08 '20 at 20:45

Create a dictionary of dfx'sindex and other. pd.DataFrame dictionary and transpose. That will give you a new dataframe. Join the resulting dataframe to the first two columns of dfx.

dfx.iloc[:,:-1].join(pd.DataFrame(dict(zip(dfx.index,dfx.other))).T).fillna('')



    name age school subject
0   Alex        abc     xyz
1    Jin  10               
2         12    abc     xyz
3  Peter  13

score 0 · Answer 4 · answered Oct 08 '20 at 21:05

You can apply Series to the column with dictionaries:

df.drop('other', 1).join(df['other'].apply(pd.Series).drop(0, 1))

Output:

    name   age school subject
0   Alex   NaN    abc     xyz
1    Jin  10.0    NaN     NaN
2    NaN  12.0    abc     xyz
3  Peter  13.0    NaN     NaN

Convert dataframe cell containing dictionary to dataframe with new columns pandas

4 Answers4

Linked