How can I convert a dataframe with multiple possible values in a single column into more binary features?

Question

Consider the following pandas dataframe,

In [1]: d = {'ID': [1, 1, 1, 2, 3, 4, 4], 'PROPERTY':['A','B','C','A','D','A','B']}
In [2]: test_df = pd.DataFrame(data=d)
In [3]: test_df
Out[3]: 
   ID PROPERTY
0   1        A
1   1        B
2   1        C
3   2        A
4   3        D
5   4        A
6   4        B

how can I convert this into the following pandas dataframe,

   ID A B C D
0   1 1 1 1 0       
1   2 1 0 0 0        
2   3 0 0 0 1       
3   4 1 1 0 0

This would be for a variable number of possible features, not just 4 as shown in this case. Also, note how each ID now only needs to appear once in the ID column.

Since I will be working with a lot of data, I am trying to implement this efficiently. Avoiding a for-loop would be best here, if possible. Thank you for the help!

score 2 · Accepted Answer · answered Mar 21 '19 at 13:22

2

Use pd.crosstab

pd.crosstab(df.ID, df.PROPERTY)


    A   B   C   D
ID              
1   1   1   1   0
2   1   0   0   0
3   0   0   0   1
4   1   1   0   0

answered Mar 21 '19 at 13:22

rafaelc

57,686
15
58
82

1

Simple and clean, thanks! – Evan Mar 21 '19 at 13:38

How can I convert a dataframe with multiple possible values in a single column into more binary features?

1 Answers1