Pandas groupby followed by nlargest

Asked Mar 19 '18 at 20:31

Active Mar 19 '18 at 20:32

Viewed 1,754 times

I'm trying to group by Date column and then sort by 'OI' column and show top 2 rows:

    top2 = df.groupby(['Date'],sort=False).apply(pd.DataFrame.nlargest,2,'OI')

First problem is that apply is very, very slow since I have a lot of groups. What would be a faster alternative? Also, how would I get just the second row sorted by 'OI', instead of the largest 2. Thanks in advance

edited Mar 19 '18 at 20:32

martineau

119,623
25
170
301

asked Mar 19 '18 at 20:31

Z.G

1

There's also `df.groupby('Date')['OI'].nlargest(2)`. – pault Mar 19 '18 at 20:49
1

Possible duplicate of [Pandas good approach to get top-n records within each group](https://stackoverflow.com/questions/20069009/pandas-good-approach-to-get-top-n-records-within-each-group) – pault Mar 19 '18 at 20:49
Thanks, that helps. Now, can anyone tell me why the following doesn't work: `df.groupby('Date')['OI'].nth(1)`. It doesn't always return the row with largest 'OI' – Z.G Mar 20 '18 at 12:21
[`nth`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.nth.html) doesn't sort – pault Mar 20 '18 at 14:24
Thank you! Sorting on 'OI' first did the trick: `df.sort_values('OI', ascending=False).groupby('Date').nth(0).dropna()` – Z.G Mar 20 '18 at 14:36

Pandas groupby followed by nlargest

0 Answers0