8

Iam trying to get the row with maximum value based on another column of a groupby, I am trying to follow the solutions given here Python : Getting the Row which has the max value in groups using groupby, however it doesn't work when you apply

annotations.groupby(['bookid','conceptid'], sort=False)['weight'].max()

I get

bookid    conceptid
12345678  3942     0.137271
          10673    0.172345
          1002     0.125136
34567819  44407    1.370921
          5111     0.104729
          6160     0.114766
          200      0.151629
          3504     0.152793

But I'd like to get only the row with the highest weight, e.g.,

bookid    conceptid
12345678  10673    0.172345
34567819  44407    1.370921

I'd appreciate any help

Community
  • 1
  • 1
ssierral
  • 8,537
  • 6
  • 26
  • 44
  • 1
    Just a thought, would this give you what you wanted: `annotations.groupby(['bookid'], sort=False)['weight'].max()` – EdChum Nov 07 '14 at 08:59

3 Answers3

11

If you need the bookid and conceptid for the maximum weight, try this

annotations.ix[annotations.groupby(['bookid'], sort=False)['weight'].idxmax()][['bookid', 'conceptid', 'weight']]

Note: Since Pandas v0.20 ix has been deprecated. Use .loc instead.

jpp
  • 159,742
  • 34
  • 281
  • 339
user1827356
  • 6,764
  • 2
  • 21
  • 30
2

based on your example of what you want, I think you have too much stuff in your group. I think you want only:

annotations.groupby(['bookid'], sort=False)['weight'].max()
JD Long
  • 59,675
  • 58
  • 202
  • 294
2

After grouping we can pass aggregation functions to the grouped object as a dictionary within the agg function.

annotations.groupby('bookid').agg({'weight': ['max']})
aravinda_gn
  • 1,263
  • 1
  • 11
  • 20
  • I think in newer version it should be `annotations.groupby('bookid').agg(weight = 'max')` – Imran Jun 24 '22 at 15:15