3

Input Df:

ID Values
1  1;6;7
2  1;6;7
3  5;7
4  1;5;9;10;2;3

Expected df

ID 1  2   3  4  5  6  7  8  9 10
1  1  0   0  0  0  1  1  0  0  0
2  1  0   0  0  0  1  1  0  0  0 
3  0  0   0  0  1  0  1  0  0  0  
4  1  1   1  0  1  0  0  0  1  1

Problem Statement:

I have a column Values which has colon separated values. I now want to make these values as column names and fill those column values with 1 ,0 .

Example: ID 1 has 1;6;7 so ID 1 has 1 in column 1 ,6 and & and rest is 0

I couldn't find any solution which could achieve this?

Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51
  • Possible duplicate of https://stackoverflow.com/questions/45312377/how-to-one-hot-encode-from-a-pandas-column-containing-a-list – giser_yugang Apr 05 '19 at 10:02

1 Answers1

3

Use Series.str.get_dummies with argument sep=';'.

The column names will be string, so its necessary to map them to int using DataFrame.rename then use Dataframe.reindex and numpy.arange for your desired output:

(df.Values.str.get_dummies(sep=';')
 .rename(columns=lambda x: int(x))
 .reindex(np.arange(11), axis=1, fill_value=0))

[out]

  0   1   2   3   4   5   6   7   8   9   10
1   0   1   0   0   0   0   1   1   0   0   0
2   0   1   0   0   0   0   1   1   0   0   0
3   0   0   0   0   0   1   0   1   0   0   0
4   0   1   1   1   0   1   0   0   0   1   1
Chris Adams
  • 18,389
  • 4
  • 22
  • 39