I would like to group my dataframe by the first three columns sequentially, then retain the rows with the maximum value of 'i1' for each group. If there are duplicate rows, I would like to retain just one row.
For example:
in:
df = pd.DataFrame({'id': ['aa', 'aa','aa','aa','b','b','aa','b','d','d'], \
'run#': [1, 1, 1, 2, 1,1,3,1,3,3],\
'z': [2, 2, 3, 2, 2, 2, 2, 2, 2, 2], \
'i1': [100, 200, 100, 100, 200,201,201, 500,500,500],\
'i2': [12,12,12,12,12,12,12,12,12,12]})
out:
id run# z i1 i2
0 aa 1 2 100 12
1 aa 1 2 200 12
2 aa 1 3 100 12
3 aa 2 2 100 12
4 b 1 2 200 12
5 b 1 2 201 12
6 aa 3 2 201 12
7 b 1 2 500 12
8 d 3 2 500 12
9 d 3 2 500 12
My desired output is:
id run# z i1 i2
0 aa 1 2 200 12
1 aa 1 3 100 12
2 aa 2 2 100 12
3 aa 3 2 201 12
4 b 1 2 500 12
5 d 3 2 500 12