pandas DataFrame rolling apply np.argmin and manual np.argmin give different results

Question

Edit: reduction to a simpler case

In [1]: np.argmin(pd.Series([-6.0, 7.0, np.NaN]))
Out[2]: 0

In [2]: pd.Series([-6.0, 7.0, np.NaN]).rolling(3).apply(np.argmin)                                                                                                                                                                                                               
Out[2]: 
0   NaN
1   NaN
2   NaN
dtype: float64

In [3]: pd.Series([-6.0, 7.0, np.NaN]).rolling(3).apply(np.argmin)[2]                                                                                                                                                                                                            
Out[3]: nan

Why do these two calculations give different results?

Original case

Trying to improve my solution for rolling idxmin/max, I hit the following issue.


In [1]: index = map(chr, range(ord('a'), ord('a') + 10))

In [2]: df = pd.DataFrame((10 * np.random.randn(10, 3)).astype(int), index=index)

In [3]: df[0][3:4] = np.NaN

In [4]: df                                                                                                                                                                                                                                                                       
Out[4]: 
      0   1   2
a   0.0  -2  -7
b  -6.0   7   7
c   7.0 -23 -13
d   NaN   4  -6
e   7.0  19  10
f  -3.0   4  -2
g   9.0 -16  -2
h  13.0  15  -2
i   6.0   8   0
j  -9.0 -10  11

In [5]: df.rolling(3).apply(np.argmin)                                                                                                                                                                                                                                           
Out[5]: 
     0    1    2
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c  1.0  2.0  2.0
d  NaN  1.0  1.0
e  NaN  0.0  0.0
f  NaN  0.0  0.0
g  1.0  2.0  1.0
h  0.0  1.0  0.0
i  2.0  0.0  0.0
j  2.0  2.0  0.0

In [6]: np.argmin(pd.Series([-6.0, 7.0, np.NaN]))  # for index 'd', col 0                                                                                                                                                                                                                                                                                                                                                                                                                                                              
Out[6]: 0

Shouldn't the manual application of np.argmin (for index 'd', column 0) give the same result as the corresponding rolling application? Why does the rolling application give me NaN instead of 0?

score 0 · Answer 1 · answered Jan 01 '21 at 20:51

Of course, this is a RTFM case...

For DataFrame.rolling:

min_periods (int, default None): Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, min_periods will default to 1. Otherwise, min_periods will default to the size of the window.

So:

In [1]: np.argmin(pd.Series([-6.0, 7.0, np.NaN]))                                                                                                                                                                                                                                
Out[1]: 0

In [2]: pd.Series([-6.0, 7.0, np.NaN]).rolling(3, min_periods=0).apply(np.argmin)[2]                                                                                                                                                                                             
Out[2]: 0.0

pandas DataFrame rolling apply np.argmin and manual np.argmin give different results

Edit: reduction to a simpler case

Original case

1 Answers1