2

I have a pandas dataframe. My goal is to select only those rows where column C has the largest value within group B. For example, when B is "one" the maximum value of C is 311, so I would like the row where C = 311 and B = "one."

import pandas as pd
import numpy as np

df2 = pd.DataFrame({ 'A' : 1., 
    'A' : pd.Categorical(["test1","test2","test3","test4"]),
    'B' : pd.Categorical(["one","one","two","two"]),
    'C' : np.array([311,42,31,41]),
    'D' : np.array([9,8,7,6])
    })

df2.groupby('C').max()

Output should be:

test1 one 311 9
test4 two 41  6
Paul H
  • 65,268
  • 20
  • 159
  • 136
Lukas Halim
  • 535
  • 1
  • 7
  • 12

1 Answers1

3

You can use idxmax(), which returns the indices of the max values:

maxes = df2.groupby('B')['C'].idxmax()
df2.loc[maxes]

Output:

Out[11]: 
       A    B    C  D
0  test1  one  311  9
3  test4  two   41  6
Marius
  • 58,213
  • 16
  • 107
  • 105