0

Edit: reduction to a simpler case

In [1]: np.argmin(pd.Series([-6.0, 7.0, np.NaN]))
Out[2]: 0

In [2]: pd.Series([-6.0, 7.0, np.NaN]).rolling(3).apply(np.argmin)                                                                                                                                                                                                               
Out[2]: 
0   NaN
1   NaN
2   NaN
dtype: float64

In [3]: pd.Series([-6.0, 7.0, np.NaN]).rolling(3).apply(np.argmin)[2]                                                                                                                                                                                                            
Out[3]: nan

Why do these two calculations give different results?

Original case

Trying to improve my solution for rolling idxmin/max, I hit the following issue.


In [1]: index = map(chr, range(ord('a'), ord('a') + 10))

In [2]: df = pd.DataFrame((10 * np.random.randn(10, 3)).astype(int), index=index)

In [3]: df[0][3:4] = np.NaN

In [4]: df                                                                                                                                                                                                                                                                       
Out[4]: 
      0   1   2
a   0.0  -2  -7
b  -6.0   7   7
c   7.0 -23 -13
d   NaN   4  -6
e   7.0  19  10
f  -3.0   4  -2
g   9.0 -16  -2
h  13.0  15  -2
i   6.0   8   0
j  -9.0 -10  11

In [5]: df.rolling(3).apply(np.argmin)                                                                                                                                                                                                                                           
Out[5]: 
     0    1    2
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c  1.0  2.0  2.0
d  NaN  1.0  1.0
e  NaN  0.0  0.0
f  NaN  0.0  0.0
g  1.0  2.0  1.0
h  0.0  1.0  0.0
i  2.0  0.0  0.0
j  2.0  2.0  0.0

In [6]: np.argmin(pd.Series([-6.0, 7.0, np.NaN]))  # for index 'd', col 0                                                                                                                                                                                                                                                                                                                                                                                                                                                              
Out[6]: 0

Shouldn't the manual application of np.argmin (for index 'd', column 0) give the same result as the corresponding rolling application? Why does the rolling application give me NaN instead of 0?

nilo
  • 818
  • 8
  • 20

1 Answers1

0

Of course, this is a RTFM case...

For DataFrame.rolling:

min_periods (int, default None): Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, min_periods will default to 1. Otherwise, min_periods will default to the size of the window.

So:

In [1]: np.argmin(pd.Series([-6.0, 7.0, np.NaN]))                                                                                                                                                                                                                                
Out[1]: 0

In [2]: pd.Series([-6.0, 7.0, np.NaN]).rolling(3, min_periods=0).apply(np.argmin)[2]                                                                                                                                                                                             
Out[2]: 0.0

nilo
  • 818
  • 8
  • 20