Can numpy functions replace this for loop to gain speed?

Question

I have many of these matrix arrays, in which I want to replace the 0 entries with the closes non zero entry with the lowest index. This can be done easily using a for loop:

import numpy as np

input_array = np.array([ 0.01561,  0.01561,  0.02039,  0.02039,  0.02776,  0.02776,
  0.03997,  0.,          0.03997,  0.06243,   0.,          0.,       0.0624662,
  0.11105,  0.,          0.,          0.,          0.11105,  0.24986,
  0.,          0.,          0.,          0.,          0.,          0.,
  0.24986])

for i in range(0,len(input_array)) : 
   if input_array[i] == 0 : 
      input_array[i] = input_array[i-1]

Would anyone suggest me if it is worth the effort?

See [this](https://stackoverflow.com/questions/41190852/most-efficient-way-to-forward-fill-nan-values-in-numpy-array). — Psidom, Jun 16 '17 at 22:35
for what size of array? This example or something much bigger? — hpaulj, Jun 17 '17 at 01:06
You can probably get about an order of magnitude speedup using a more optimized approach. Whether or not that's worth it is hard to say — AGML, Jun 17 '17 at 01:39

score 1 · Accepted Answer · answered Jun 17 '17 at 17:52

Applying the numpy solution in:

Most efficient way to forward-fill NaN values in numpy array

def foo2(arr):
    idx=np.where(arr==0,0,np.arange(len(arr)))
    idx=np.maximum.accumulate(idx)
    return arr[idx]

def foo1(arr):
    arr = arr.copy()
    for i in range(len(arr)):
        if arr[i]==0:
            arr[i] = arr[i-1]
    return arr

For your test array, arr, the speed improvement is modest:

In [67]: timeit foo1(arr)
100000 loops, best of 3: 18.1 µs per loop
In [68]: timeit foo2(arr)
The slowest run took 1387.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.4 µs per loop

But with a larger one, the loop increases with size, the array version barely changes:

In [69]: arr1=np.concatenate((arr,arr,arr,arr,arr,arr,arr))
In [70]: timeit foo1(arr1)
10000 loops, best of 3: 116 µs per loop
In [71]: timeit foo2(arr1)
The slowest run took 4.16 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.6 µs per loop

The details of the idx construction:

In [72]: idx=np.arange(len(arr))
In [73]: idx[arr==0]=0
In [74]: idx
Out[74]: 
array([ 0,  1,  2,  3,  4,  5,  6,  0,  8,  9,  0,  0, 12, 13,  0,  0,  0, 17, 18,  0,  0,  0,  0,  0,  0, 25])
In [75]: idx=np.maximum.accumulate(idx)
In [76]: idx
Out[76]: 
array([ 0,  1,  2,  3,  4,  5,  6,  6,  8,  9,  9,  9, 12, 13, 13, 13, 13, 17, 18, 18, 18, 18, 18, 18, 18, 25], dtype=int32)

Thank you very much for the research. This code is for readying data from atomic transitions and it has a very weird format. I have to run more tests as you did but I am sure it will improve the efficiency. — Delosari, Jun 17 '17 at 21:05

Can numpy functions replace this for loop to gain speed?

1 Answers1