Approach #1
Here's a loopy way that iterates only through the list of non-null positions -
def nullnext(s, W):
a = s.values
idx = np.flatnonzero(s.notnull().values)+1
last_idx = idx[0]
a[last_idx:last_idx+W] = np.nan
for i in idx[1:]:
if i > last_idx + W:
last_idx = i
a[last_idx:last_idx+W] = np.nan
return s
Sample run -
In [336]: s
Out[336]:
0 1.0
1 NaN
2 2.0
3 3.0
4 NaN
5 NaN
6 4.0
7 5.0
8 NaN
Name: NaN, dtype: float64
In [337]: nullnext(s, W=4)
Out[337]:
0 1.0
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 4.0
7 NaN
8 NaN
Name: NaN, dtype: float64
Approach #2
With few tweaks, we can port this onto numba
for performance efficiency. The implementation involves using strides
. The relevant codes would look something like this -
from numba import njit
# https://stackoverflow.com/a/40085052/ @Divakar
def strided_app(a, L, S ): # Window len = L, Stride len/stepsize = S
nrows = ((a.size-L)//S)+1
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))
@njit
def set_mask(mask, idx, W):
last_idx = idx[0]
mask[0] = True
l = len(idx)
for i in range(1,l):
if idx[i] > last_idx + W:
last_idx = idx[i]
mask[i] = True
return mask
def nullnext_numba(s, W):
a = s.values
idx = np.flatnonzero(s.notnull().values)+1
mask = np.zeros(len(idx),dtype=bool)
set_mask(mask, idx, W)
a_ext = np.concatenate((a, np.full(W,np.nan)))
strided_app(a_ext, W, 1)[idx[mask]] = np.nan
return pd.Series(a_ext[:-W])
Further improvement
We could optimize it further to improve memory efficiency by avoiding the concatenation and do all those edits in-situ with the input series and hence improve performance as well, like so -
def nullnext_numba_v2(s, W):
a = s.values
idx = np.flatnonzero(s.notnull().values)+1
mask = np.zeros(len(idx),dtype=bool)
set_mask(mask, idx, W)
valid_idx = idx[mask]
limit_mask = valid_idx < len(a) - W
strided_app(a, W, 1)[valid_idx[limit_mask]] = np.nan
leftover_idx = valid_idx[~limit_mask]
if len(leftover_idx)>0:
a[leftover_idx[0]:] = np.nan
return s