I have a dataframe
d = pd.DataFrame({'d': ['d1', 'd1', 'd1', 'd1', 'd2', 'd2', 'd2','d2'], 's':[1, 2, 1, 1, 2, 2, 1, 2], 'v': [100, 5000, 200, 900, 100, 2000, 300, 300]})
d s v
0 d1 1 100
1 d1 2 5000
2 d1 1 200
3 d1 1 900
4 d2 2 100
5 d2 2 2000
6 d2 1 300
7 d2 2 300
And I would like to keep only the rows that have the maximum value of v for each unique combination of columns d and s. So the result would look like
d = pd.DataFrame({'d': ['d1', 'd1', 'd2', 'd2'], 's':[1, 2, 1, 2], 'v': [900, 5000, 300, 2000]})
d s v
0 d1 1 900
1 d1 2 5000
2 d2 1 300
3 d2 2 2000
How can I do this efficiently? In case of a tie, I would prefer to keep the first entry or use another column t (not shown in the example) as a tie breaker if that is easy to implement.
EDIT: For the tie breaker, I would have another column t. The dataframe below (slightly changed from the inital one) now has a tie at d2 for s=1, v=300
d = pd.DataFrame({'d': ['d1', 'd1', 'd1', 'd1', 'd2', 'd2', 'd2','d2'], 't':[1, 3, 5, 6, 2, 3, 8, 4], 's':[1, 2, 1, 1, 2, 2, 1, 1], 'v': [100, 5000, 200, 900, 100, 2000, 300, 300]})
d s t v
0 d1 1 1 100
1 d1 2 3 5000
2 d1 1 5 200
3 d1 1 6 900
4 d2 2 2 100
5 d2 2 3 2000
6 d2 1 8 300
7 d2 1 4 300
and the result should look like
d = pd.DataFrame({'d': ['d1', 'd1', 'd2', 'd2'], 't': [6, 3, 4, 3], 's':[1, 2, 1, 2], 'v': [900, 5000, 300, 2000]})
d s t v
0 d1 1 6 900
1 d1 2 3 5000
2 d2 1 4 300
3 d2 2 3 2000
So, we select line 7 over line 6 since t is smaller