Consider the following pandas dataframe,
In [1]: d = {'ID': [1, 1, 1, 2, 3, 4, 4], 'PROPERTY':['A','B','C','A','D','A','B']}
In [2]: test_df = pd.DataFrame(data=d)
In [3]: test_df
Out[3]:
ID PROPERTY
0 1 A
1 1 B
2 1 C
3 2 A
4 3 D
5 4 A
6 4 B
how can I convert this into the following pandas dataframe,
ID A B C D
0 1 1 1 1 0
1 2 1 0 0 0
2 3 0 0 0 1
3 4 1 1 0 0
This would be for a variable number of possible features, not just 4 as shown in this case. Also, note how each ID now only needs to appear once in the ID column.
Since I will be working with a lot of data, I am trying to implement this efficiently. Avoiding a for-loop would be best here, if possible. Thank you for the help!