21

I have some DataFrame which I want to group by the ID, e. g.:

import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'], 'user_id': [1,2,1,1,3,1,5]})
print df

Which generates:

  item_id  user_id
0       a        1
1       a        2
2       b        1
3       b        1
4       b        3
5       c        1
6       d        5

[7 rows x 2 columns]

I can easily group by the id:

grouped = df.groupby("item_id")

But how can I return only the first N group-by objects? E. g. I want only the first 3 unique item_ids.

feetwet
  • 3,248
  • 7
  • 46
  • 84
Christian Sauer
  • 10,351
  • 10
  • 53
  • 85

2 Answers2

23

Here is one way using list(grouped).

result = [g[1] for g in list(grouped)[:3]]

# 1st
result[0]

  item_id  user_id
0       a        1
1       a        2

# 2nd
result[1]

  item_id  user_id
2       b        1
3       b        1
4       b        3
Jianxun Li
  • 24,004
  • 10
  • 58
  • 76
4

One method is to use Counter to get the top 3 unique items from the list, filter your DataFrame based on those items, and then perform a groupby operation on this filtered DataFrame.

from collections import Counter

c = Counter(df.item_id)
most_common = [item for item, _ in c.most_common(3)]

>>> df[df.item_id.isin(most_common)].groupby('item_id').sum()
         user_id
item_id         
a              3
b              5
c              1
Alexander
  • 105,104
  • 32
  • 201
  • 196