Selecting max within partition for pandas dataframe

Question

I have a pandas dataframe. My goal is to select only those rows where column C has the largest value within group B. For example, when B is "one" the maximum value of C is 311, so I would like the row where C = 311 and B = "one."

import pandas as pd
import numpy as np

df2 = pd.DataFrame({ 'A' : 1., 
    'A' : pd.Categorical(["test1","test2","test3","test4"]),
    'B' : pd.Categorical(["one","one","two","two"]),
    'C' : np.array([311,42,31,41]),
    'D' : np.array([9,8,7,6])
    })

df2.groupby('C').max()

Output should be:

test1 one 311 9
test4 two 41  6

The question linked above was asked this morning (US Pacific time) — Paul H, Dec 16 '14 at 02:48
Applying `Paul H` solution to your problem yields: `df2.groupby('B').apply(lambda k : k[k['C'] == k['C'].max()])` — Akavall, Dec 16 '14 at 03:04

score 3 · Accepted Answer · answered Dec 16 '14 at 03:13

3

You can use idxmax(), which returns the indices of the max values:

maxes = df2.groupby('B')['C'].idxmax()
df2.loc[maxes]

Output:

Out[11]: 
       A    B    C  D
0  test1  one  311  9
3  test4  two   41  6

answered Dec 16 '14 at 03:13

Marius

58,213
16
107
105

Selecting max within partition for pandas dataframe

1 Answers1