Get the row corresponding to the max in pandas GroupBy

Question

Simple DataFrame:

df = pd.DataFrame({'A': [1,1,2,2], 'B': [0,1,2,3], 'C': ['a','b','c','d']})
df
   A  B  C
0  1  0  a
1  1  1  b
2  2  2  c
3  2  3  d

I wish for every value (groupby) of column A, to get the value of column C, for which column B is maximum. For example for group 1 of column A, the maximum of column B is 1, so I want the value "b" of column C:

   A  C
0  1  b
1  2  d

No need to assume column B is sorted, performance is of top priority, then elegance.

score 9 · Accepted Answer · answered Jan 23 '19 at 20:06

9

Check with sort_values +drop_duplicates

df.sort_values('B').drop_duplicates(['A'],keep='last')
Out[127]: 
   A  B  C
1  1  1  b
3  2  3  d

answered Jan 23 '19 at 20:06

BENY

317,841
20
164
234

1

That is impressive, I had to say. – Giora Simchoni Jan 23 '19 at 20:08
1

Accepting this answer as according to `timeit` it's faster than @coldspeed's by 0.0002 seconds [`np.mean(timeit.repeat("df.sort_values('B').drop_duplicates(['A'],keep='last')", number = 1, repeat = 100, globals = globals()))`] – Giora Simchoni Jan 23 '19 at 20:27
1

@GioraSimchoni Thank you for the fair consideration and timings! – cs95 Jan 23 '19 at 20:30
this is brilliant! – Aman Singh Jun 24 '21 at 14:00

score 6 · Answer 2 · answered Jan 23 '19 at 20:01

6

df.groupby('A').apply(lambda x: x.loc[x['B'].idxmax(), 'C'])
#    A
#1    b
#2    d

Use idxmax to find the index where B is maximal, then select column C within that group (using a lambda-function

answered Jan 23 '19 at 20:01

Jondiedoop

3,303
9
24

score 5 · Answer 3 · answered Jan 23 '19 at 20:16

Here's a little fun with groupby and nlargest:

(df.set_index('C')
   .groupby('A')['B']
   .nlargest(1)
   .index
   .to_frame()
   .reset_index(drop=True))

   A  C
0  1  b
1  2  d

Or, sort_values, groupby, and last:

df.sort_values('B').groupby('A')['C'].last().reset_index()

   A  C
0  1  b
1  2  d

score 2 · Answer 4 · answered Jan 23 '19 at 20:24

2

Similar solution to @Jondiedoop, but avoids the apply:

u = df.groupby('A')['B'].idxmax()

df.loc[u, ['A', 'C']].reset_index(drop=1)

   A  C
0  1  b
1  2  d

answered Jan 23 '19 at 20:24

user3483203

50,081
9
65
94

Get the row corresponding to the max in pandas GroupBy

4 Answers4

Linked

Related