5

I have the following table:

Item number | crit_A | crit_B|
------------|--------|-------|
     1      |  100   |  20   |
     1      |   10   | 100   |
     1      |   50   |  50   |
     2      |   10   | 100   |
     2      |   90   |  10   |
     2      |   90   |  10   |

I would like a pandas dataframe operation to only return the first and 5th row. This corresponds to the rows where crit_A is max for a given item.

Item number | crit_A | crit_B|
------------|--------|-------|
     1      |  100   |  20   |
     2      |   90   |  10   |

Note: When crit_A has multiple equal values for a given item, I just need one item returned.

The following is not what I am looking for:

res_82_df.groupby(['Item number']).max()

This does not work because it would group by Item number but return the maximum value for all columns. Also note: I could look for an arbitrary threshold and perform a query statement. But this approach is also not robust because I would always need to look at the data and make a value judgement.

How do I accomplish this efficiently?

Note: My question indeed is a duplicate of the question linked above. The answer here though is very unique and much more concise, and does what I ask it to.

Thornhale
  • 2,118
  • 1
  • 23
  • 40

1 Answers1

10

I'd do it this way:

In [107]: df.loc[df.groupby('Item number')['crit_A'].idxmax()]
Out[107]:
   Item number  crit_A  crit_B
0            1     100      20
4            2      90      10
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419