-2

I have a dataframe that contains a column which holds dictionary like shown below: enter image description here

I want to expand the trans_score column. I want each key, value of the dictionary values as 2 columns. Something like this(example for first row):

enter image description here

How to do this in pandas?

SeaBean
  • 22,547
  • 3
  • 13
  • 25
Doof
  • 123
  • 11
  • it would be better to provide sample data as text `df.head(10).to_dict("list")`. looks like a simple case of `df.join(df["trans_score"].apply(pd.Series))` will do what you want – Rob Raymond Aug 03 '21 at 08:27
  • Welcome to stackoverflow, please read [tour] and [mre] and in this case also: [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (1) – Andreas Aug 03 '21 at 08:39

1 Answers1

3

As you want to extract the dict keys and dict values both as new column values (rather than dict keys as column indexes and dict values as column values), we need to extract them separately, as follows:

df_ts = pd.DataFrame({'City': df['trans_score'].apply(lambda x: list(x.keys())).tolist(), 
                      'Score': df['trans_score'].apply(lambda x: list(x.values())).tolist()})

Then, to expand each row into 2 rows and attach the email information, we can use:

df_out = df[['emailid_f']].join(df_ts).apply(pd.Series.explode)

Optionally, if you want to rename the column emailid_f to emailid and reset the index, you can use:

df_out = df_out.rename(columns={'emailid_f': 'emailid'}).reset_index(drop=True)

Demo

data = {'emailid_f': {0: 'email1', 1: 'email2'},
 'trans_score': {0: {'key11': 'val11', 'key12': 'val12'},
  1: {'key21': 'val21', 'key22': 'val22'}}}
df = pd.DataFrame(data)

print(df)

  emailid_f                           trans_score
0    email1  {'key11': 'val11', 'key12': 'val12'}
1    email2  {'key21': 'val21', 'key22': 'val22'}

df_ts = pd.DataFrame({'City': df['trans_score'].apply(lambda x: x.keys()).tolist(), 
                      'Score': df['trans_score'].apply(lambda x: x.values()).tolist()})

print(df_ts)

             City           Score
0  (key11, key12)  (val11, val12)
1  (key21, key22)  (val21, val22)

df_out = df[['emailid_f']].join(df_ts).apply(pd.Series.explode)

print(df_out)

  emailid_f   City  Score
0    email1  key11  val11
0    email1  key12  val12
1    email2  key21  val21
1    email2  key22  val22

df_out = df_out.rename(columns={'emailid_f': 'emailid'}).reset_index(drop=True)

print(df_out)

  emailid   City  Score
0  email1  key11  val11
1  email1  key12  val12
2  email2  key21  val21
3  email2  key22  val22
SeaBean
  • 22,547
  • 3
  • 13
  • 25