Accessing single NumPy array elements or (elementwise-)iterating over a NumPy array is slow (like really slow). If you ever want to do a manual iteration over a NumPy array: Just don't do it!
But you got some options. The easiest is to convert the array to a Python list and iterate over the list (sounds silly, but stay with me - I'll present some benchmarks at the end of the answer 1):
X = X.tolist()
Y = Y.tolist()
for n in range(1, len(X)):
Y[n] = X[n] + X[n - 1] + Y[n-1]
If you also use direct iteration over the lists, it could be even faster:
X = X.tolist()
Y = Y.tolist()
for idx, (Y_n_m1, X_n, X_n_m1) in enumerate(zip(Y, X[1:], X), 1):
Y[idx] = X_n + X_n_m1 + Y_n_m1
Then there are more sophisticated options that require additional packages. Most notably Cython
and Numba
, these are designed to work on the array-elements directly and avoid Python overhead whenever possible. For example with Numba you could just use your approach inside a jitted (just-in-time compiled) function:
import numba as nb
@nb.njit
def func(X, Y):
for n in range(1, len(X)):
Y[n] = X[n] + X[n - 1] + Y[n-1]
There X
and Y
can be NumPy arrays but numba will work on the buffer directly, out-speeding the other approaches (possibly by orders of magnitude).
Numba is a "heavier" dependency than Cython, but it can be faster and easier to use. But without conda
it's hard to install numba... YMMV
However here's also a Cython version of the code (compiled using IPython magic, it's a bit different if you're not using IPython):
In [1]: %load_ext cython
In [2]: %%cython
...:
...: cimport cython
...:
...: @cython.boundscheck(False)
...: @cython.wraparound(False)
...: cpdef cython_indexing(double[:] X, double[:] Y):
...: cdef Py_ssize_t n
...: for n in range(1, len(X)):
...: Y[n] = X[n] + X[n - 1] + Y[n-1]
...: return Y
Just to give an example (based on the timing framework from my answer to another question), regarding the timings:
import numpy as np
import numba as nb
import scipy.signal
def numpy_indexing(X, Y):
for n in range(1, len(X)):
Y[n] = X[n] + X[n - 1] + Y[n-1]
return Y
def list_indexing(X, Y):
X = X.tolist()
Y = Y.tolist()
for n in range(1, len(X)):
Y[n] = X[n] + X[n - 1] + Y[n-1]
return Y
def list_direct(X, Y):
X = X.tolist()
Y = Y.tolist()
for idx, (Y_n_m1, X_n, X_n_m1) in enumerate(zip(Y, X[1:], X), 1):
Y[idx] = X_n + X_n_m1 + Y_n_m1
return Y
@nb.njit
def numba_indexing(X, Y):
for n in range(1, len(X)):
Y[n] = X[n] + X[n - 1] + Y[n-1]
return Y
def numpy_cumsum(X, Y):
Y[1:] = X[1:] + X[:-1]
np.cumsum(Y, out=Y)
return Y
def scipy_lfilter(X, Y):
a = [1, -1]
b = [1, 1]
return Y[0] - X[0] + scipy.signal.lfilter(b, a, X)
# Make sure the approaches give the same result
X = np.random.random(10000)
Y = np.zeros(10000)
Y[0] = np.random.random()
np.testing.assert_array_equal(numba_indexing(X, Y), numpy_indexing(X, Y))
np.testing.assert_array_equal(numba_indexing(X, Y), numpy_cumsum(X, Y))
np.testing.assert_almost_equal(numba_indexing(X, Y), scipy_lfilter(X, Y))
np.testing.assert_array_equal(numba_indexing(X, Y), cython_indexing(X, Y))
# Timing setup
timings = {numpy_indexing: [],
list_indexing: [],
list_direct: [],
numba_indexing: [],
numpy_cumsum: [],
scipy_lfilter: [],
cython_indexing: []}
sizes = [2**i for i in range(1, 20, 2)]
# Timing
for size in sizes:
X = np.random.random(size=size)
Y = np.zeros(size)
Y[0] = np.random.random()
for func in timings:
res = %timeit -o func(X, Y)
timings[func].append(res)
# Plottig absolute times
%matplotlib notebook
import matplotlib.pyplot as plt
fig = plt.figure(1)
ax = plt.subplot(111)
for func in timings:
ax.plot(sizes,
[time.best for time in timings[func]],
label=str(func.__name__))
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()
# Plotting relative times
fig = plt.figure(1)
ax = plt.subplot(111)
baseline = numba_indexing # choose one function as baseline
for func in timings:
ax.plot(sizes,
[time.best / ref.best for time, ref in zip(timings[func], timings[baseline])],
label=str(func.__name__))
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time relative to "{}"'.format(baseline.__name__))
ax.grid(which='both')
ax.legend()
plt.tight_layout()
With the following results:
Absolute runtimes

Relative runtimes (compared to the numba function)

So, just by converting it to a list you will be roughly 3 times faster! By iterating directly over these lists you get another (yet smaller) speedup, only 20% in this benchmark but we're almost 4 times faster now compared to the original solution. With numba you can speed it by a factor of more than 100 compared to the list operations! And Cython is only a bit slower than numba (~40-50%), probably because I haven't squeezed out every possible optimization (usually it's not more than 10-20% slower) you could do with Cython. However for large arrays the difference gets smaller.
1 I did go into more details in another answer. That Q+A was about converting to a set
but because set
uses (hidden) "manual iteration" it also applies here.
I included the timings for the NumPy cumsum
and Scipy lfilter
approaches. These were roughly 20 times slower for small arrays and 4 times slower for large arrays compared to the numba function. However if I interpret the question correctly you looked for general ways not only ones that applied in the example. Not every self-referencing loop can be implemented using cum*
functions from NumPy or SciPys filters. But even then it seems like they can't compete with Cython and/or numba.