10

consider the pd.Series s

import pandas as pd
import numpy as np

np.random.seed([3,1415])
s = pd.Series(np.random.randint(0, 10, 10), list('abcdefghij'))
s

a    0
b    2
c    7
d    3
e    8
f    7
g    0
h    6
i    8
j    6
dtype: int64

I want to get the index for the max value for the rolling window of 3

s.rolling(3).max()

a    NaN
b    NaN
c    7.0
d    7.0
e    8.0
f    8.0
g    8.0
h    7.0
i    8.0
j    8.0
dtype: float64

What I want is

a    None
b    None
c       c
d       c
e       e
f       e
g       e
h       f
i       i
j       i
dtype: object

What I've done

s.rolling(3).apply(np.argmax)

a    NaN
b    NaN
c    2.0
d    1.0
e    2.0
f    1.0
g    0.0
h    0.0
i    2.0
j    1.0
dtype: float64

which is obviously not what I want

piRSquared
  • 285,575
  • 57
  • 475
  • 624

6 Answers6

15

There is no simple way to do that, because the argument that is passed to the rolling-applied function is a plain numpy array, not a pandas Series, so it doesn't know about the index. Moreover, the rolling functions must return a float result, so they can't directly return the index values if they're not floats.

Here is one approach:

>>> s.index[s.rolling(3).apply(np.argmax)[2:].astype(int)+np.arange(len(s)-2)]
Index([u'c', u'c', u'e', u'e', u'e', u'f', u'i', u'i'], dtype='object')

The idea is to take the argmax values and align them with the series by adding a value indicating how far along in the series we are. (That is, for the first argmax value we add zero, because it is giving us the index into a subsequence starting at index 0 in the original series; for the second argmax value we add one, because it is giving us the index into a subsequence starting at index 1 in the original series; etc.)

This gives the correct results, but doesn't include the two "None" values at the beginning; you'd have to add those back manually if you wanted them.

There is an open pandas issue to add rolling idxmax.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • nice thats clever – thomas.mac Dec 14 '17 at 22:21
  • Is it that uncommon to calculate a rolling idxmax? I would've expected it to be efficiently implemented by now.. I can't even find a decent implementation in Julia, which doesn't need external C help. Is there something I'm missing here which would make this really hard/unneeded? – Harel Rozental Feb 15 '21 at 15:15
4

I used a generator

def idxmax(s, w):
    i = 0
    while i + w <= len(s):
        yield(s.iloc[i:i+w].idxmax())
        i += 1

pd.Series(idxmax(s, 3), s.index[2:])

c    c
d    c
e    e
f    e
g    e
h    f
i    i
j    i
dtype: object
piRSquared
  • 285,575
  • 57
  • 475
  • 624
3

Here's an approach using broadcasting -

maxidx = (s.values[np.arange(s.size-3+1)[:,None] + np.arange(3)]).argmax(1)
out = s.index[maxidx+np.arange(maxidx.size)]

This generates all the indices corresponding to the rolling windows, indexes into the extracted array version with those and thus gets the max indices for each window. For a more efficient indexing, we can use NumPy strides, like so -

arr = s.values
n = arr.strides[0]
maxidx = np.lib.stride_tricks.as_strided(arr, \
                   shape=(s.size-3+1,3), strides=(n,n)).argmax(1)
Divakar
  • 218,885
  • 19
  • 262
  • 358
3

I think this is the easiest way, just use lambda as bellow:

rolling_max_index=df.rolling(period).apply(lambda x: x.idxmax())
Phlaskoo
  • 31
  • 2
  • Good shot. This works perfectly. – Lumber Jack May 31 '22 at 08:30
  • The problem with this is that it gives you the index of the highest value compared to every window (not the whole dataframe). So for example, if you're using a window size of 10, no matter how big your dataframe is, it'll give you numbers between 0 to 9 which was a problem in my case. – Masih Bahmani Feb 03 '23 at 19:18
2

Just chiming in on how I solved a similar problem that I had. I did not want to find the index exactly, I wanted to find how long ago the max value happened. But this could be used to find the index as well.

I'm basically using the shift strategy, but I'm iterating over several shifts with a configurable length. It's probably slow, but it works good enough for me.

import pandas as pd


length = 5

data = [1, 2, 3, 4, 5, 4, 3, 4, 5, 6, 7, 6, 5, 4, 5, 4, 3]
df = pd.DataFrame(data, columns=['number'])
df['helper_max'] = df.rolling(length).max()

for i in range(length, -1, -1):
    # Set the column to what you want. You may grab the index 
    # if you wish, I wanted number of rows since max happened
    df.loc[df['number'].shift(i) == df['helper_max'], 'n_rows_ago_since_max'] = i

print(df)

Output:

    number  helper_max  n_rows_ago_since_max
0        1         NaN                   NaN
1        2         NaN                   NaN
2        3         NaN                   NaN
3        4         NaN                   NaN
4        5         5.0                   0.0
5        4         5.0                   1.0
6        3         5.0                   2.0
7        4         5.0                   3.0
8        5         5.0                   0.0
9        6         6.0                   0.0
10       7         7.0                   0.0
11       6         7.0                   1.0
12       5         7.0                   2.0
13       4         7.0                   3.0
14       5         7.0                   4.0
15       4         6.0                   4.0
16       3         5.0                   2.0
Rkey
  • 690
  • 2
  • 8
  • 25
1

You can also simulate the rolling window by creating a DataFrame and use idxmax as follows:

window_values = pd.DataFrame({0: s, 1: s.shift(), 2: s.shift(2)})
s.index[np.arange(len(s)) - window_values.idxmax(1)]

Index(['a', 'b', 'c', 'c', 'e', 'e', 'e', 'f', 'i', 'i'], dtype='object', name=0)

As you can see, the first two terms are the idxmax as applied to the initial windows of lengths 1 and 2 rather than null values. It's not as efficient as the accepted answer and probably not a good idea for large windows but just another perspective.

JoeCondron
  • 8,546
  • 3
  • 27
  • 28