1

Fairly new to Python and Pandas here.

I am trying to combine the top nth rows based on the values in a separate column into a single row with Pandas.

Using a hypothetical example, let's say I have the following table that is already sorted desc by the 'amount' column:

store_id item amount
00001 shirt 5
00001 sock 3
00001 pants 1
00002 sock 4
00002 pants 2
00002 shirt 1

I would like to generate a table that groups by the store_id, with each row being a list of the top n items based on the value of the 'amount' column. So if I wanted to see the top 2 items by store_id, the table would look like this:

store_id item
00001 ['shirt', 'sock']
00002 ['sock', 'pants']

I tried following along with the suggestion here: How to combine multiple rows into a single row with pandas , however I keep running into a "'GroupedData' object is not subscriptable" error.

Would greatly appreciate any suggestions on how to solve this. Thank you in advance.

SeaBean
  • 22,547
  • 3
  • 13
  • 25
MS25
  • 81
  • 5
  • Is it that you just want to get n top elements from each group or n largest elements from each group? If it is former then you use `df.groupby(...).first(2)` – Ch3steR Jul 28 '21 at 18:47

3 Answers3

2

As your table is already sorted desc by the amount column, you can get the top n-th rows for each group by GroupBy.head(n). To further group item column of these top n-th rows into list, you can further use GroupBy.agg(), as follows:

n = 2      # define n

(df.groupby('store_id').head(n)
   .groupby('store_id')['item'].agg(list)
).reset_index()

Result:

   store_id           item
0         1  [shirt, sock]
1         2  [sock, pants]
SeaBean
  • 22,547
  • 3
  • 13
  • 25
1

Try this:

(df.sort_values('amount',ascending=False)
 .groupby('store_id')
 .agg({'item':lambda x: x.iloc[0:2].tolist()}))
rhug123
  • 7,893
  • 1
  • 9
  • 24
1

Try:

output = df.groupby("store_id") \
           .apply(lambda x: x.nlargest(2, "amount")["item"].tolist()) \
           .rename(index="item")

>>> output
store_id
00001    [shirt, sock]
00002    [sock, pants]
Name: item, dtype: object
not_speshal
  • 22,093
  • 2
  • 15
  • 30