1

I have a dataframe that looks like this

timestamp                  group   score
2020-06-03 09:01:27+00:00  A       13
2020-06-03 09:05:18+00:00  B       14
                           C       15
2020-06-03 09:05:39+00:00  A       12
2020-06-03 09:05:45+00:00  B       16
2020-06-03 09:07:09+00:00  C       17
2020-06-03 09:08:43+00:00  A       26

I would like to take a rolling window of size three and find the maximum score. The part I'm struggling with is how to keep the timestamp and group information. The expected output would be:

timestamp                  group   score
2020-06-03 09:05:18+00:00  C       15
                           C       15
2020-06-03 09:05:45+00:00  B       16
2020-06-03 09:07:09+00:00  C       17
2020-06-03 09:08:43+00:00  A       26

So importantly, the first row in the output is duplicated. Thanks!

The Ref
  • 684
  • 2
  • 7
  • 20

1 Answers1

2

Idea is get indices for maximal values of score by this solution and then pass to DataFrame.loc:

N = 3
s = df['score']
maxidx = (s.values[np.arange(s.size-N+1)[:,None] + np.arange(N)]).argmax(1)
df = df.loc[s.index[maxidx+np.arange(maxidx.size)]]
print (df)
                                 score
timestamp                 group       
2020-06-03 09:05:18+00:00 C         15
                          C         15
2020-06-03 09:05:45+00:00 B         16
2020-06-03 09:07:09+00:00 C         17
2020-06-03 09:08:43+00:00 A         26

Or:

N = 3   
s = df['score']
#https://stackoverflow.com/a/40101614
idx = s.index[s.rolling(N).apply(np.argmax)[(N-1):].astype(int)+np.arange(len(s)-(N-1))]
df = df.loc[idx]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252