1

I am trying to find the top and second highest value I can get the highest using

df['B'] = df['a'].rolling(window=3).max()

But how do I get the second highest please?

Such that df['C'] will display as per below

A    B    C
1
6
5    6    5
4    6    5
12   12   5
J Ng
  • 779
  • 7
  • 18

1 Answers1

1

Generic n-highest values in rolling/sliding windows

Here's one using np.lib.stride_tricks.as_strided to create sliding windows that lets us choose any generic N highest value in sliding windows -

# https://stackoverflow.com/a/40085052/ @Divakar
def strided_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

# Return N highest nums in rolling windows of length W off array ar
def N_highest(ar, W, N=1): 
    # ar : Input array
    # W : Window length
    # N : Get us the N-highest in sliding windows 
    A2D = strided_app(ar,W,1)
    idx = (np.argpartition(A2D, -N, axis=1) == A2D.shape[1]-N).argmax(1)
    return A2D[np.arange(len(idx)), idx]

Sample runs -

In [634]: a = np.array([1,6,5,4,12]) # input array

In [635]: N_highest(a, W=3, N=1)  # highest in W=3
Out[635]: array([ 6,  6, 12])

In [636]: N_highest(a, W=3, N=2)  # second highest
Out[636]: array([5, 5, 5])

In [637]: N_highest(a, W=3, N=3)  # third highest
Out[637]: array([1, 4, 4])

Another shorter way based on strides, would be with direct sorting, like so -

np.sort(strided_app(ar,W,1), axis=1)[:,-N]]

Solving our case

Hence, to solve our case, we need to concatenate with NaNs alongwith the result from the above mentioned function, like so -

W = 3
df['C'] = np.r_[ [np.nan]*(W-1), N_highest(df.A.values, W=W, N=2)]

Based on direct sorting, we would have -

df['C'] = np.r_[ [np.nan]*(W-1), np.sort(strided_app(df.A,W,1), axis=1)[:,-2]]

Sample run -

In [578]: df
Out[578]: 
   A
0  1
1  6
2  5
3  4
4  3  # <== Different from given sample, for variety

In [619]: W = 3

In [620]: df['C'] = np.r_[ [np.nan]*(W-1), N_highest(df.A.values, W=W, N=2)]

In [621]: df
Out[621]: 
   A    C
0  1  NaN
1  6  NaN
2  5  5.0
3  4  5.0
4  3  4.0 # <== Second highest from the last group off : [5,4,3]
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358