Expand a dictionary column into rows with both dict keys and dict values as new column values in pandas

Question

I have a dataframe that contains a column which holds dictionary like shown below:

I want to expand the trans_score column. I want each key, value of the dictionary values as 2 columns. Something like this(example for first row):

How to do this in pandas?

it would be better to provide sample data as text `df.head(10).to_dict("list")`. looks like a simple case of `df.join(df["trans_score"].apply(pd.Series))` will do what you want — Rob Raymond, Aug 03 '21 at 08:27
Welcome to stackoverflow, please read [tour] and [mre] and in this case also: [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (1) — Andreas, Aug 03 '21 at 08:39

SeaBean · Accepted Answer · 2021-08-03T09:44:26.563

3

As you want to extract the dict keys and dict values both as new column values (rather than dict keys as column indexes and dict values as column values), we need to extract them separately, as follows:

df_ts = pd.DataFrame({'City': df['trans_score'].apply(lambda x: list(x.keys())).tolist(), 
                      'Score': df['trans_score'].apply(lambda x: list(x.values())).tolist()})

Then, to expand each row into 2 rows and attach the email information, we can use:

df_out = df[['emailid_f']].join(df_ts).apply(pd.Series.explode)

Optionally, if you want to rename the column emailid_f to emailid and reset the index, you can use:

df_out = df_out.rename(columns={'emailid_f': 'emailid'}).reset_index(drop=True)

Demo

data = {'emailid_f': {0: 'email1', 1: 'email2'},
 'trans_score': {0: {'key11': 'val11', 'key12': 'val12'},
  1: {'key21': 'val21', 'key22': 'val22'}}}
df = pd.DataFrame(data)

print(df)

  emailid_f                           trans_score
0    email1  {'key11': 'val11', 'key12': 'val12'}
1    email2  {'key21': 'val21', 'key22': 'val22'}

df_ts = pd.DataFrame({'City': df['trans_score'].apply(lambda x: x.keys()).tolist(), 
                      'Score': df['trans_score'].apply(lambda x: x.values()).tolist()})

print(df_ts)

             City           Score
0  (key11, key12)  (val11, val12)
1  (key21, key22)  (val21, val22)

df_out = df[['emailid_f']].join(df_ts).apply(pd.Series.explode)

print(df_out)

  emailid_f   City  Score
0    email1  key11  val11
0    email1  key12  val12
1    email2  key21  val21
1    email2  key22  val22

df_out = df_out.rename(columns={'emailid_f': 'emailid'}).reset_index(drop=True)

print(df_out)

  emailid   City  Score
0  email1  key11  val11
1  email1  key12  val12
2  email2  key21  val21
3  email2  key22  val22

edited Aug 03 '21 at 09:44

answered Aug 03 '21 at 09:04

SeaBean

22,547
3
13
25

@ seaBean i get the following error: 'dict_values' object does not support indexing – Doof Aug 03 '21 at 09:22
the part where you find df_ts worked fine – Doof Aug 03 '21 at 09:23
@Doof Ok, let me look at it. – SeaBean Aug 03 '21 at 09:23
@Doof In that case, could you try replacing the code of defining `df_ts` as follows: `df_ts = pd.DataFrame({'City': df['trans_score'].apply(lambda x: list(x.keys())).tolist(), 'Score': df['trans_score'].apply(lambda x: list(x.values())).tolist()})` – SeaBean Aug 03 '21 at 09:26
@Doof Edited my solution above. Please try again. – SeaBean Aug 03 '21 at 09:33
The data size is quite a lot , it is taking time . will update once it runs – Doof Aug 03 '21 at 09:36
Thanks , it is a little slow but it does the job. cheers – Doof Aug 03 '21 at 09:49
@Doof That's great! As it needs to expand into multiple rows for each row. It takes time. Glad that it is still within acceptable time. – SeaBean Aug 03 '21 at 09:52

Expand a dictionary column into rows with both dict keys and dict values as new column values in pandas

1 Answers1