0

Here is the data frame. Which contain some cells having a dictionary in them. I want to convert the dictionary items to columns

dfx={'name':['Alex','Jin',np.nan,'Peter'],
     'age':[np.nan,10,12,13],
     'other':[{'school':'abc','subject':'xyz'},
     np.nan,
     {'school':'abc','subject':'xyz'},
     np.nan,]
     }

dfx=pd.DataFrame(dfx)

Output

name    age        other
Alex             {'school': 'abc', 'subject': 'xyz'}
Jin     10.0    
        12.0     {'school': 'abc', 'subject': 'xyz'}
Peter   13.0    

Here is the Desired output

name    age      school    subject
Alex             abc         xyz
Jin     10.0        
        12.0     abc         xyz
Peter   13.0    
Talha Anwar
  • 2,699
  • 4
  • 23
  • 62
  • 1
    wont be very fast, so if your frame is large I do not recommend this but you could do `dfx.join(dfx['other'].apply(pd.Series)).drop(columns=['other', 0])` – It_is_Chris Oct 08 '20 at 20:39
  • i am trying this way `pd.DataFrame(list(dfx['other'].dropna()))` but it reset the index, making it impossible to put back – Talha Anwar Oct 08 '20 at 20:40

4 Answers4

2

You can use the .str.get accessor to actually index into the dictionaries in your columns. This also returns nan whenever the cell value is nan instead of a dictionary:

clean_df = (dfx
            .assign(
               school=lambda df: df["other"].str.get("school"),
               subject=lambda df: df["other"].str.get("subject"))
            .drop("other", axis=1))

print(clean_df)
    name   age school subject
0   Alex   NaN    abc     xyz
1    Jin  10.0    NaN     NaN
2    NaN  12.0    abc     xyz
3  Peter  13.0    NaN     NaN
Cameron Riddell
  • 10,942
  • 9
  • 19
2

Try this

df_final = dfx[['name','age']].assign(**pd.DataFrame(dfx.other.to_dict()).T)

Out[41]:
    name   age school subject
0   Alex   NaN    abc     xyz
1    Jin  10.0    NaN     NaN
2    NaN  12.0    abc     xyz
3  Peter  13.0    NaN     NaN
Andy L.
  • 24,909
  • 4
  • 17
  • 29
1

Create a dictionary of dfx'sindex and other. pd.DataFrame dictionary and transpose. That will give you a new dataframe. Join the resulting dataframe to the first two columns of dfx.

dfx.iloc[:,:-1].join(pd.DataFrame(dict(zip(dfx.index,dfx.other))).T).fillna('')



    name age school subject
0   Alex        abc     xyz
1    Jin  10               
2         12    abc     xyz
3  Peter  13               
wwnde
  • 26,119
  • 6
  • 18
  • 32
0

You can apply Series to the column with dictionaries:

df.drop('other', 1).join(df['other'].apply(pd.Series).drop(0, 1))

Output:

    name   age school subject
0   Alex   NaN    abc     xyz
1    Jin  10.0    NaN     NaN
2    NaN  12.0    abc     xyz
3  Peter  13.0    NaN     NaN
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73