ffill with limit in numpy

Question

For performance reasons I'd like to use Numpy to do the same kind of forward fill I can get with Pandas like so:

s = pd.Series([1, 2, nan, nan, nan, 7, 8, nan, nan, nan, nan, nan, nan, nan, nan])
s.ffill(limit=7)

which results in:

array([ 1., 2., 2., 2., 2., 7., 8., 8., 8., 8., 8., 8., 8., 8., nan])

I've got a forward fill without limit working, but that doesn't do it for me.

Does this answer your question https://stackoverflow.com/questions/41190852/most-efficient-way-to-forward-fill-nan-values-in-numpy-array? It includes a vectorized numpy solution by one of the best numpy question-answerers on this platform. — jkr, Mar 28 '21 at 01:58

Kevin · Answer 1 · 2021-03-28T14:01:32.940

Basically combined solutions from these two sources:

and added assignment for indices that are over the threshold limit:

arr = np.array([1, 2, np.nan, np.nan, np.nan, 7, 8, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan])
limit = 7

mask = np.isnan(arr)
idx = np.where(~mask, np.arange(mask.size), False)

# Create an array that is 1 where idx is 0, and pad each end with an extra 0.
isnonzero = np.concatenate(([0], np.equal(idx, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(isnonzero))

# pair-wise indices for consecutive zero ranges
ranges = np.flatnonzero(absdiff == 1)

# add to_begin padding to get overflow and ranges equal size
overflow = np.ediff1d(ranges, to_begin=0) - limit
overflow_mask = overflow > 0

# do the actual fill
arr = arr[np.maximum.accumulate(idx)]

# reassign overflowed elements to nan
arr[ranges[overflow_mask] - overflow[overflow_mask]] = np.nan

Output:

array([ 1.,  2.,  2.,  2.,  2.,  7.,  8.,  8.,  8.,  8.,  8.,  8.,  8., 8., nan])

Only tested it with the one given example, so there might be some inconsistency.

EDIT

Noticed that this will only set the first exceeding element, so probably not the solution you are looking for. I tried to reshape ranges into (-1,2) and modify in-place each range on the first axis to be a slice over the corresponding overflow. This does work but i had no luck in trying to assign the 2d slices back to arr.

# pair-wise indices for consecutive zero ranges
ranges = np.flatnonzero(absdiff == 1).reshape(-1,2)

overflow = np.diff(ranges) - limit
overflow_mask = overflow > 0
# modify ranges so overflowing ranges is a slice over the corresponding overflow
ranges[None,:,0][overflow_mask.T] = ranges[None,:,1][overflow_mask.T] - overflow[overflow_mask]

ffill with limit in numpy

1 Answers1