Pandas Groupy take only the first N Groups

Question

I have some DataFrame which I want to group by the ID, e. g.:

import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'], 'user_id': [1,2,1,1,3,1,5]})
print df

Which generates:

  item_id  user_id
0       a        1
1       a        2
2       b        1
3       b        1
4       b        3
5       c        1
6       d        5

[7 rows x 2 columns]

I can easily group by the id:

grouped = df.groupby("item_id")

But how can I return only the first N group-by objects? E. g. I want only the first 3 unique item_ids.

Wouldn't it be easier to just filter the df first? `df[df['item_id'].isin(df['item_id'].unique()[:3])].groupby('item_id')`? — EdChum, Jul 27 '15 at 14:37
Iterate over first 3 groups.`for n,(k,gg) in enumerate(list(g)[:3])` where g is a instance of `groupby`. — BSalita, May 10 '21 at 18:18

score 23 · Accepted Answer · answered Jul 27 '15 at 14:34

23

Here is one way using list(grouped).

result = [g[1] for g in list(grouped)[:3]]

# 1st
result[0]

  item_id  user_id
0       a        1
1       a        2

# 2nd
result[1]

  item_id  user_id
2       b        1
3       b        1
4       b        3

answered Jul 27 '15 at 14:34

Jianxun Li

24,004
10
58
76

Thank you, that is a good idea. Due to some constraints a used a random query in the end. – Christian Sauer Jul 27 '15 at 17:35
1

This instantiates all of the groups in a list when you only need the first 3, so it's extremely inefficient for large `DataFrame`s. – Denziloe Dec 22 '21 at 22:17

score 4 · Answer 2 · answered Jul 27 '15 at 14:37

4

One method is to use Counter to get the top 3 unique items from the list, filter your DataFrame based on those items, and then perform a groupby operation on this filtered DataFrame.

from collections import Counter

c = Counter(df.item_id)
most_common = [item for item, _ in c.most_common(3)]

>>> df[df.item_id.isin(most_common)].groupby('item_id').sum()
         user_id
item_id         
a              3
b              5
c              1

answered Jul 27 '15 at 14:37

Alexander

105,104
32
201
196

Thank you, that is a good idea. Due to some constraints a used a random query in the end. – Christian Sauer Jul 27 '15 at 17:35

Pandas Groupy take only the first N Groups

2 Answers2

Linked