7

I have a dataframe like

item      date       hour     value
  a         4         12       123
  a         6         11        54
  b         1          7       146
  c         8          1        97
  a         9          5        10
  c         4          5       114
  b         1          7       200
...       ...        ...       ...

and I want to keep the top 10 item by value (discard the rest is ok), regardless any other column. They are not sorted.

Following my input example, and as I didn't write enough to get 10 from every item, the expected output would be something like this if I want the top 1:

item      date       hour     value
  a         4         12       123
  c         4          5       114
  b         1          7       200
...       ...        ...       ...

I've seen this answer but I'm not sure how to tell pandas to take value for the calculation.

yatu
  • 86,083
  • 12
  • 84
  • 139
Javier
  • 801
  • 3
  • 10
  • 24

1 Answers1

11

You can sort_values by both ['item', 'value'] and then groupby.head:

df.sort_values(['item', 'value'], ascending=False).groupby('item').head(10)

Or with nlargest:

df.groupby('item').value.nlargest(10).reset_index()
yatu
  • 86,083
  • 12
  • 84
  • 139