Combine top n-th rows of a group into a single row of list with Pandas

Question

Fairly new to Python and Pandas here.

I am trying to combine the top nth rows based on the values in a separate column into a single row with Pandas.

Using a hypothetical example, let's say I have the following table that is already sorted desc by the 'amount' column:

store_id	item	amount
00001	shirt	5
00001	sock	3
00001	pants	1
00002	sock	4
00002	pants	2
00002	shirt	1

I would like to generate a table that groups by the store_id, with each row being a list of the top n items based on the value of the 'amount' column. So if I wanted to see the top 2 items by store_id, the table would look like this:

store_id	item
00001	['shirt', 'sock']
00002	['sock', 'pants']

I tried following along with the suggestion here: How to combine multiple rows into a single row with pandas , however I keep running into a "'GroupedData' object is not subscriptable" error.

Would greatly appreciate any suggestions on how to solve this. Thank you in advance.

Is it that you just want to get n top elements from each group or n largest elements from each group? If it is former then you use `df.groupby(...).first(2)` — Ch3steR, Jul 28 '21 at 18:47

score 2 · Accepted Answer · answered Jul 28 '21 at 19:18

2

As your table is already sorted desc by the amount column, you can get the top n-th rows for each group by GroupBy.head(n). To further group item column of these top n-th rows into list, you can further use GroupBy.agg(), as follows:

n = 2      # define n

(df.groupby('store_id').head(n)
   .groupby('store_id')['item'].agg(list)
).reset_index()

Result:

   store_id           item
0         1  [shirt, sock]
1         2  [sock, pants]

answered Jul 28 '21 at 19:18

SeaBean

22,547
3
13
25

1

This worked. Thank you very much for the help. – MS25 Jul 28 '21 at 21:20
@moses Pleased to help! :-) – SeaBean Jul 28 '21 at 21:34

score 1 · Answer 2 · answered Jul 28 '21 at 18:43

1

Try this:

(df.sort_values('amount',ascending=False)
 .groupby('store_id')
 .agg({'item':lambda x: x.iloc[0:2].tolist()}))

answered Jul 28 '21 at 18:43

rhug123

7,893
1
9
24

score 1 · Answer 3 · answered Jul 28 '21 at 18:44

1

Try:

output = df.groupby("store_id") \
           .apply(lambda x: x.nlargest(2, "amount")["item"].tolist()) \
           .rename(index="item")

>>> output
store_id
00001    [shirt, sock]
00002    [sock, pants]
Name: item, dtype: object

answered Jul 28 '21 at 18:44

not_speshal

22,093
2
15
30

Combine top n-th rows of a group into a single row of list with Pandas

3 Answers3