5
dec = 0.1
data = np.array([100,200,300,400,500])

I have a for loop like this

y = np.zeros(len(data))
for i in range(len(data)):
    if i == 0:
        y[i] = (1.0 - dec) * data[i]
    else:
        y[i] = (1.0 - dec) * data[i] + (dec * y[i - 1])

Output y is:

array([ 90.   , 189.   , 288.9  , 388.89 , 488.889])

And now I want to do the above calculation without a loop, so if I break the code and do

data[0] = (1.0 - dec) * data[0]
data[1:] = (1.0 - dec) * data[1:] + (dec * data[0])

Output data is:

array([ 90, 189, 279, 369, 459])

When you compare y and data output first two values are correct because it is getting multiplied with data[0] which makes sense but later on it should continue as the loop does in loop code, so how can we achieve that? Is there a function that can handle this? This is mainly to optimize my code so that it runs faster for thousands of data.

The expected output is the same as the y output.

Adam.Er8
  • 12,675
  • 3
  • 26
  • 38
Akilesh
  • 413
  • 3
  • 11
  • 1
    **Duplicate** of [python - Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one? - Stack Overflow](https://stackoverflow.com/questions/4407984/is-it-possible-to-vectorize-recursive-calculation-of-a-numpy-array-where-each-el) -- unless there's some special way for this question that allows an explicit form – user202729 Sep 02 '21 at 13:57
  • In this particular case you may be able to do it in pure numpy with reasonable accuracy if it's provided that `dec ≤ 0.1` by limiting to exactly 15 iterations; however I doubt it will be faster than a numba solution. – user202729 Sep 02 '21 at 13:59
  • Does this answer your question? [Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?](https://stackoverflow.com/questions/4407984/is-it-possible-to-vectorize-recursive-calculation-of-a-numpy-array-where-each-el) – Adam.Er8 Sep 02 '21 at 14:00
  • There is a special case here in that `y[i] = (1 - dec) * sum(dec ** j * data[i - j] for j in range(i + 1))`. This feels like something you could do with clever matrix multiplication. – Kyle Parsons Sep 02 '21 at 14:09
  • Can someone recreate that on my problem? @Adam.Er8 – Akilesh Sep 02 '21 at 14:18
  • Tm would be my data and what does tau mean in my thing? – Akilesh Sep 02 '21 at 14:22

2 Answers2

2

I know that you said not to use python for loop

But also np.vectorize is not real vectorization(it will not move your code C) it is only convenience

Since you said 1000s of data so you should try numba, as it moves for loop to machine code

As of now I am not able to think of getting the same correct output only using numpy ufuncs(np.add, np.dot, etc) only, since ufuncs are known to vectorize(real simd vectorization) depending if compiler can do it, so maybe for now you can try numba

import numba as nb
import numpy as np

dec = 0.1
data = np.array([100,200,300,400,500])
@nb.jit((nb.int64[:],))
def f(data):
    y = np.zeros(data.shape[0])
    for i in range(y.shape[0]):
        if i == 0:
            y[i] = (1.0 - dec) * data[i]
        else:
            y[i] = (1.0 - dec) * data[i] + (dec * y[i - 1])
    return y
print(f(data))

It is also possible to parallelize(here), but I am not sure how to do it correctly in this case. Also infact I doubt if vectorization and also parallelism is possible to this problem without adding more complicated code

eroot163pi
  • 1,791
  • 1
  • 11
  • 23
  • 1
    What we call 'vectorization' in numpy (including ufunc) is primarily the use of compiled code - doing the loops in C rather than python. It does not, as a rule use hardware specific 'vectorization' (unless the compiler can do that). – hpaulj Sep 02 '21 at 14:39
2

We can do this with scipy.linalg.toeplitz to make a matrix of shifts of the data and then multiplying that by powers of dec and summing columns:

import numpy as np
from scipy.linalg import toeplitz

dec = 0.1
data = np.array([100,200,300,400,500])

decs = np.power(dec, np.arange(len(data)))

r = np.zeros_like(data)
r[0] = data[0]
toep = toeplitz(r, data)

output = (1 - dec) * np.sum(toep * decs.reshape(-1, 1), axis=0)

First decs is a vector of powers of dec:

print(decs) 
#[1.e+00 1.e-01 1.e-02 1.e-03 1.e-04]

Next, we use toeplitz to make a matrix of shifts of data:

print(toep)
#[[100 200 300 400 500]
# [  0 100 200 300 400]
# [  0   0 100 200 300]
# [  0   0   0 100 200]
# [  0   0   0   0 100]])

Finally we reshape decs into a column, multiply it by toep and sum along columns. This result needs to be scaled by 1 - dec.

This all works because we can rewrite our definition of data[i] as a sum instead of recursively:

y[i] = (1.0 - dec) * data[i] + (dec * y[i - 1])
y[i] = (1.0 - dec) * data[i] + (dec * ((1.0 - dec) * data[i - 1] + (dec * y[i - 2])))
...
y[i] = (1.0 - dec) * (data[i] + dec * data[i - 1] + dec ** 2 * data[i - 2] + ... dec ** i * data[0])
y[i] = (1.0 - dec) * sum(dec ** j * data[i - j] for j in range(i + 1))

This can be proven by induction.

From there it follows from rewriting those sums as columns of a matrix and translating that matrix to a calculation in numpy/scipy.

Kyle Parsons
  • 1,475
  • 6
  • 14
  • I found another simple way to do this but I'm not getting exact output though, it is data = (1.0 - decay) * data data[1:] = data[1:] + decay * data[0:-1] If I do that I'm getting output like: array([ 90., 189., 288., 387., 486.]) which is something similar to output y but is there some floating values missing and unable to achieve correct output? – Akilesh Sep 03 '21 at 10:18
  • It looks like that this solution is even slower than the normal method because it takes time quadratic in the length of the array – user202729 Sep 03 '21 at 10:47