1

I have pandas daraframe where I have multiple column which has list of dictionary. Ex:

col1  col2   combine_col1                 combine_col2
val1  val2   [{'x1':'v1','x2':'v2'}]      [{'x3':'v3','x4':'v4','x5':'v5'}]
val11 val22  [{'x1':'v11','x2':'v22'}]    [{'x3':'v33','x4':'v44','x5':'v55'}]

Inside the combined column the number of keys and their name I don't know.It can be anything.

I want to explode those specific columns which has list of dict and get all the keys as new columns. For this example the output dataframe column names should be col1, col2, x1, x2,x3,x4,x5. Currently I am hardcoding the combine cols and then doing explode operation. But I want that to happen automatically.

Dcook
  • 899
  • 7
  • 32
  • https://stackoverflow.com/questions/38231591/split-explode-a-column-of-dictionaries-into-separate-columns-with-pandas – alvas Mar 13 '23 at 05:50

1 Answers1

2

If there are only one element lists like in sample data use json_normalize with str[0] for select them:

import ast

cols = ['combine_col1','combine_col2']

#if necessary
#df[cols] = df[cols].applymap(ast.literal_eval)

df1 = (df.drop(cols, axis=1)
         .join(pd.concat([pd.json_normalize(df[x].str[0]) for x in cols], axis=1)))
print (df1)
    col1   col2   x1   x2   x3   x4   x5
0   val1   val2   v1   v2   v3   v4   v5
1  val11  val22  v11  v22  v33  v44  v55

EDIT: Solution with append prefix:

df1 = (df.drop(cols, axis=1)
         .join(pd.concat([pd.json_normalize(df[x].str[0]).add_prefix(f'{x}.') 
                          for x in cols], axis=1)))
print (df1)
    col1   col2 combine_col1.x1 combine_col1.x2 combine_col2.x3  \
0   val1   val2              v1              v2              v3   
1  val11  val22             v11             v22             v33   

  combine_col2.x4 combine_col2.x5  
0              v4              v5  
1             v44             v55  

If possible multiple values per lists use Series.explode:

df1 = (df.drop(cols, axis=1)
         .join(pd.concat([pd.json_normalize(df[x].explode()) for x in cols], axis=1)))

df1 = (df.drop(cols, axis=1)
         .join(pd.concat([pd.json_normalize(df[x].explode()).add_prefix(f'{x}.') 
                          for x in cols], axis=1)))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for the explanation. If additionally I want to add the actual column name as prefix.Example: x1 will be combine_col1.x1 then can I do in one should without doing column rename multiple times? – Dcook Mar 13 '23 at 06:37
  • @Dcook - in first or second solution? – jezrael Mar 13 '23 at 06:46
  • @Dcook - Added to solution. – jezrael Mar 13 '23 at 06:48
  • second solution – Dcook Mar 13 '23 at 07:01
  • Sometime this gives error: `AttributeError: 'float' object has no attribute 'values'`. How can I avoid this error if there any decimal or int value?I am using this but still getting error`df1 = (df.drop(cols, axis=1) .join(pd.concat( [pd.json_normalize(df[x].explode(), errors='ignore').applymap( lambda x: [x] if isinstance(x, (int, float)) else x).add_prefix(f'{x}.') for x in cols], axis=1)))` – Dcook Apr 10 '23 at 12:26