4

I'm trying to group by Date column and then sort by 'OI' column and show top 2 rows:

    top2 = df.groupby(['Date'],sort=False).apply(pd.DataFrame.nlargest,2,'OI')

First problem is that apply is very, very slow since I have a lot of groups. What would be a faster alternative? Also, how would I get just the second row sorted by 'OI', instead of the largest 2. Thanks in advance

martineau
  • 119,623
  • 25
  • 170
  • 301
Z.G
  • 41
  • 2
  • 1
    There's also `df.groupby('Date')['OI'].nlargest(2)`. – pault Mar 19 '18 at 20:49
  • 1
    Possible duplicate of [Pandas good approach to get top-n records within each group](https://stackoverflow.com/questions/20069009/pandas-good-approach-to-get-top-n-records-within-each-group) – pault Mar 19 '18 at 20:49
  • Thanks, that helps. Now, can anyone tell me why the following doesn't work: `df.groupby('Date')['OI'].nth(1)`. It doesn't always return the row with largest 'OI' – Z.G Mar 20 '18 at 12:21
  • [`nth`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.nth.html) doesn't sort – pault Mar 20 '18 at 14:24
  • Thank you! Sorting on 'OI' first did the trick: `df.sort_values('OI', ascending=False).groupby('Date').nth(0).dropna()` – Z.G Mar 20 '18 at 14:36

0 Answers0