1

I would like to group my dataframe by the first three columns sequentially, then retain the rows with the maximum value of 'i1' for each group. If there are duplicate rows, I would like to retain just one row.

For example:


in:

df = pd.DataFrame({'id': ['aa', 'aa','aa','aa','b','b','aa','b','d','d'], \
     'run#': [1, 1, 1, 2, 1,1,3,1,3,3],\
     'z': [2, 2, 3, 2, 2, 2, 2, 2, 2, 2], \
     'i1': [100, 200, 100, 100, 200,201,201, 500,500,500],\
 'i2': [12,12,12,12,12,12,12,12,12,12]})

out:
    id run# z   i1      i2
0   aa  1   2   100     12
1   aa  1   2   200     12
2   aa  1   3   100     12
3   aa  2   2   100     12
4   b   1   2   200     12
5   b   1   2   201     12
6   aa  3   2   201     12
7   b   1   2   500     12
8   d   3   2   500     12
9   d   3   2   500     12

My desired output is:

    id run# z   i1      i2
0   aa  1   2   200     12
1   aa  1   3   100     12
2   aa  2   2   100     12
3   aa  3   2   201     12
4   b   1   2   500     12
5   d   3   2   500     12

nod11
  • 11
  • 2
  • `df.sort_values('i1').drop_duplicates(df.columns[0:3], keep='last')`. Then sort the index if you care about the ordering. – ALollz Jul 23 '19 at 15:37

0 Answers0