Vectorizing loop with running value that depends on previous value (+ if statement)

Question

I'm accustomed to writing vectorized statements and list comprehensions in Python, but I've got a problem that appears with both a "running" computation that depends on the previous value in the loop, as well as an if statement. Schematically it looks like this:

def my_loop(x, a=0.5, b=0.9):
  out = np.copy(x)
  prev_val = 0 
  for i in np.arange(x.shape[0]):

      if x[i] < prev_val:
          new_val = (1-a)*x[i] + a*prev_val
      else:
          new_val = (1-b)*x[i] + b*prev_val

      out[i] = new_val

      prev_val = new_val

  return out

I haven't been able to figure out how one could vectorize this (e.g. via using some kind of accumulator), so I'll ask: Is there a way to make this more Pythonic/faster?

I've seen previous posts about vectorizing when there's an if statement -- usually solved via np.where() -- but not one where there's a "running" value that depends on its previous state...so I haven't found any duplicate questions yet (and this one isn't about vectorization in the usual sense, this one is about 'previous value' but referring to list indices).

So far, I have tried np.vectorize and numba's @jit, and they do run somewhat faster, but neither gives me the speed I'm hoping for. Is there something I'm missing? (Maybe something with map()?) Thanks.

(Yes, in the case of a=b this becomes easy!)

I doubt you will get much performance gain with ``np.vectorize``. Quoting from the docs - "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.". — Deepak Saini, Nov 27 '18 at 04:21

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

JITing in nopython mode is faster. Quoting from numba docs:

Numba has two compilation modes: nopython mode and object mode. The former produces much faster code, but has limitations that can force Numba to fall back to the latter. To prevent Numba from falling back, and instead raise an error, pass nopython=True.

@nb.njit(cache=True)
def my_loop5(x, a=0.5, b=0.9):
  out = np.zeros(x.shape[0],dtype=x.dtype)
  
  for i in range(x.shape[0]):
      if x[i] < out[i-1]:
          out[i] = (1-a) * x[i] + a * out[i-1]
      else:
          out[i] = (1-b) * x[i] + b * out[i-1]
  return out

So that on:

x = np.random.uniform(low=-5.0, high=5.0, size=(1000000,))

The timing are :

my_loop4 : 0.235s

my_loop5 : 0.193s

HTH.

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 27 '18 at 04:32

Deepak Saini

2,810
1
19
26

Interesting, hey thanks for trying this. When I compare autojit with njit, the latter is about the same but a bit slower. 41.1 µs per loop for autojit vs. 41.2 µs per loop for njit. – sh37211 Nov 27 '18 at 04:38
@sh37211 Try on larger inputs, as their is compile time also included, or time on second run after the compiled code is cached. – Deepak Saini Nov 27 '18 at 04:41

score 0 · Accepted Answer · answered Nov 27 '18 at 04:18

I realized that by removing dummy variables, this code could be put into a form where numba and @autojit could work their magic and make it "fast":

from numba import jit, autojit

@autojit
def my_loop4(x, a=0.5, b=0.9):
  out = np.zeros(x.shape[0],dtype=x.dtype)
  for i in np.arange(x.shape[0]):
      if x[i] < out[i-1]:
          out[i] = (1-a)*x[i] + a*out[i-1]
      else:
          out[i] = (1-b)*x[i] + b*out[i-1]
  return out

Without the @autojit, this is still painfully slow. But with it on,...problem solved. So, removing the unnecessary variables and adding @autojit is what did the trick.

Don't use `np.arange(x.shape[0])`. This creates an array, but you want a simpile iterator like range (Python3) or xrange(Python2) — max9111, Nov 27 '18 at 08:26

Vectorizing loop with running value that depends on previous value (+ if statement)

2 Answers2