0

I am trying to use the pd.get_dummies() function to convert categorical features to numerical, but the problem is that I have a column with lists.This is the genre column by the way.

0     ['Action', 'Adventure', 'Comedy', 'Drama', 'Sc...

1     ['Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space']

2     ['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D...

3     ['Action', 'Magic', 'Police', 'Supernatural', ...

4     ['Adventure', 'Fantasy', 'Shounen', 'Supernatu...

I have tried all the answers on the stackoverflow which addressed this issue. Nothing works

I want the output to be

0    'Action', 'Adventure', 'Comedy', 'Drama', 'Sc...

1    'Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space'

2    'Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D...

3    'Action', 'Magic', 'Police', 'Supernatural', ...

4    'Adventure', 'Fantasy', 'Shounen', 'Supernatu...

So that I can use the get_dummies to create the dummies. Please Help!

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Mohneesh S
  • 13
  • 8

1 Answers1

0

you can use explode in pandas above 0.25 as below to do that

d = {"genre":[['Action', 'Adventure', 'Comedy', 'Drama'],  
 ['Action', 'Drama', 'Mystery', 'Sci-Fi', 'Space'],  
 ['Action', 'Sci-Fi', 'Adventure', 'Comedy'],  
 ['Action', 'Magic', 'Police', 'Supernatural'],    
 ['Adventure', 'Fantasy', 'Shounen', 'Supernatu']]}

df = pd.DataFrame(d)
pd.get_dummies(df.explode("genre").pivot(columns="genre", values="genre"))
Dev Khadka
  • 5,142
  • 4
  • 19
  • 33