1

Consider the following pandas dataframe,

In [1]: d = {'ID': [1, 1, 1, 2, 3, 4, 4], 'PROPERTY':['A','B','C','A','D','A','B']}
In [2]: test_df = pd.DataFrame(data=d)
In [3]: test_df
Out[3]: 
   ID PROPERTY
0   1        A
1   1        B
2   1        C
3   2        A
4   3        D
5   4        A
6   4        B

how can I convert this into the following pandas dataframe,

   ID A B C D
0   1 1 1 1 0       
1   2 1 0 0 0        
2   3 0 0 0 1       
3   4 1 1 0 0       

This would be for a variable number of possible features, not just 4 as shown in this case. Also, note how each ID now only needs to appear once in the ID column.

Since I will be working with a lot of data, I am trying to implement this efficiently. Avoiding a for-loop would be best here, if possible. Thank you for the help!

Evan
  • 373
  • 2
  • 3
  • 15

1 Answers1

2

Use pd.crosstab

pd.crosstab(df.ID, df.PROPERTY)


    A   B   C   D
ID              
1   1   1   1   0
2   1   0   0   0
3   0   0   0   1
4   1   1   0   0
rafaelc
  • 57,686
  • 15
  • 58
  • 82