3
df = pd.DataFrame(["c", "b", "a p", NaN, "ap"])
df[0].str.get_dummies(' ')

The above code prints something like this.

       a   p    b    c ap 
0      0   0    0    1  0
1      0   0    1    0  0 
2      1   1    0    0  0
3      0   0    0    0  0
4      0   0    0    0  1  

The required output is the following:

       a   p    b    c  
0      0   0    0    1 
1      0   0    1    0  
2      1   1    0    0 
3      0   0    0    0 
4      1   1    0    0  

I am sure it's bit tricky. Any help is appreciated.

Kathiravan Natarajan
  • 3,158
  • 6
  • 22
  • 45
  • The answer linked at the top of the question was helpful. Namely: # Create a dataframe of dummy vars col0_dummy_df = df['0'].str.get_dummies(sep=',') # Concatenate dummy variable dataframe onto main dataframe. pd.concat([df, col0_dummy_df], axis=1) – JustinTRoss May 15 '20 at 18:16

2 Answers2

1

IIUC str.get_dummies

df[0].str.get_dummies(sep=' ')
Out[745]: 
   air  bus  car  plane
0    0    0    1      0
1    0    1    0      0
2    1    0    0      1

Or

pd.get_dummies(pd.DataFrame(df[0].str.split().tolist()).stack()).sum(level=0)
Out[754]: 
   air  bus  car  plane
0    0    0    1      0
1    0    1    0      0
2    1    0    0      1
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You can use str.get_dummies

df[0].str.get_dummies(' ')


    air bus car plane
0   0   0   1   0
1   0   1   0   0
2   1   0   0   1
Vaishali
  • 37,545
  • 5
  • 58
  • 86