A tedious loop looking for improvements

Question

in my code I need to calculate the values of a vector many times which are the mean values from different patches of another array. Here is an example of my code showing how I do it but I found that it is too less-efficient in running...

import numpy as np
vector_a = np.zeros(10)
array_a = np.random.random((100,100))
for i in range(len(vector_a)):
    vector_a[i] = np.mean(array_a[:,i+20:i+40]

Is there any way to make it more efficient? Any comments or suggestions are very welcome! Many thanks!

-yes, the 20 and 40 are fixed.

I think the question is fine here. Vectorizing some loop-based logic is a super common SO question. — jdehesa, Oct 31 '18 at 12:00
Possible duplicate of [vectorize numpy mean across the slices of an array](https://stackoverflow.com/questions/36409596/vectorize-numpy-mean-across-the-slices-of-an-array) — DJK, Oct 31 '18 at 12:04
Your problem seems very similar to this question: https://stackoverflow.com/q/13728392/4800086 — Swier, Oct 31 '18 at 12:13

jdehesa · Accepted Answer · 2018-10-31T13:35:06.680

EDIT:

Actually you can do this much faster. The previous function can be improved by operating on summed columns like this:

def rolling_means_faster1(array_a, n, first, size):
    # Sum each relevant columns
    sum_a = np.sum(array_a[:, first:(first + size + n - 1)], axis=0)
    # Reshape as before
    strides_b = (sum_a.strides[0], sum_a.strides[0])
    array_b = np.lib.stride_tricks.as_strided(sum_a, (n, size), (strides_b))
    # Average
    v = np.sum(array_b, axis=1)
    v /= (len(array_a) * size)
    return v

Another way is to work with accumulated sums, adding and removing as necessary for each output element.

def rolling_means_faster2(array_a, n, first, size):
    # Sum each relevant columns
    sum_a = np.sum(array_a[:, first:(first + size + n - 1)], axis=0)
    # Add a zero a the beginning so the next operation works fine
    sum_a = np.insert(sum_a, 0, 0)
    # Sum the initial `size` elements and add and remove partial sums as necessary
    v = np.sum(sum_a[:size]) - np.cumsum(sum_a[:n]) + np.cumsum(sum_a[-n:])
    # Average
    v /= (size * len(array_a))
    return v

Benchmarking with the previous solution from before:

import numpy as np

np.random.seed(100)
array_a = np.random.random((1000, 1000))
n = 100
first = 100
size = 200

%timeit rolling_means_orig(array_a, n, first, size)
# 12.7 ms ± 55.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rolling_means(array_a, n, first, size)
# 5.49 ms ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rolling_means_faster1(array_a, n, first, size)
# 166 µs ± 874 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit rolling_means_faster2(array_a, n, first, size)
# 182 µs ± 2.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So these last two seem to be very close in performance. It may depend on the relative sizes of the inputs.

This is a possible vectorized solution:

import numpy as np

# Data
np.random.seed(100)
array_a = np.random.random((100, 100))

# Take all the relevant columns
slice_a = array_a[:, 20:40 + 10]
# Make a "rolling window" with stride tricks
strides_b = (slice_a.strides[1], slice_a.strides[0], slice_a.strides[1])
array_b = np.lib.stride_tricks.as_strided(slice_a, (10, 100, 20), (strides_b))
# Take mean
result = np.mean(array_b, axis=(1, 2))

# Original method for testing correctness
vector_a = np.zeros(10)
idv1 = np.arange(10) + 20
idv2 = np.arange(10) + 40
for i in range(len(vector_a)):
    vector_a[i] = np.mean(array_a[:,idv1[i]:idv2[i]])
print(np.allclose(vector_a, result))
# True

Here is a quick benchmark in IPython (sizes increased for appreciation):

import numpy as np

def rolling_means(array_a, n, first, size):
    slice_a = array_a[:, first:(first + size + n)]
    strides_b = (slice_a.strides[1], slice_a.strides[0], slice_a.strides[1])
    array_b = np.lib.stride_tricks.as_strided(slice_a, (n, len(array_a), size), (strides_b))
    return np.mean(array_b, axis=(1, 2))

def rolling_means_orig(array_a, n, first, size):
    vector_a = np.zeros(n)
    idv1 = np.arange(n) + first
    idv2 = np.arange(n) + (first + size)
    for i in range(len(vector_a)):
        vector_a[i] = np.mean(array_a[:,idv1[i]:idv2[i]])
    return vector_a

np.random.seed(100)
array_a = np.random.random((1000, 1000))
n = 100
first = 100
size = 200

%timeit rolling_means(array_a, n, first, size)
# 5.48 ms ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rolling_means_orig(array_a, n, first, size)
# 32.8 ms ± 762 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@LinchengLi Standard deviation is necessarily more work, since it includes squaring and taking square root. You could do the computation "by hand", using one of the "faster" methods for the mean (to accelerate that part) and then computing the rest by yourself, but this seems slower than just taking my first answer and replacing `np.mean` with `np.std` (I don't know how NumPy implements std but it may be something smarter than computing the mean first and then applying the formula). It's just a lot of computation, I'm not sure if there's much that can be done to make it faster. — jdehesa, Oct 31 '18 at 16:43

score 2 · Answer 2 · answered Oct 31 '18 at 13:35

This solution works on the assumption that you are trying to compute rolling average of a subset of window of columns. As an example and ignoring rows, given [0, 1, 2, 3, 4] and a window of 2 the averages are [0.5, 1.5, 2.5, 3.5], and that you might only want the second and third averages.

Your current solution is inefficient as it is recomputes the mean for a column for each output in vector_a. Given that (a / n) + (b / n) == (a + b) / n we can get away with computing the mean of each column only once, and then combine the column means as needed to produce the final output.

window_first_start = idv1.min() # or idv1[0]
window_last_end = idv2.max() # or idv2[-1]
window_size = idv2[0] - idv1[0]
assert ((idv2 - idv1) == window_size).all(), "sanity check, not needed if assumption holds true"

# a view of the columns we are interested in, no copying is done here
view = array_a[:,window_first_start:window_last_end]

# calculate the means for each column
col_means = view.mean(axis=0)

# cumsum is used to find the rolling sum of means and so the rolling average
# We use an out variable to make sure we have a 0 in the first element of cum_sum. 
# This makes like a little easier in the next step.
cum_sum = np.empty(len(col_means) + 1, dtype=col_means.dtype)
cum_sum[0] = 0
np.cumsum(col_means, out=cum_sum[1:])

result = (cum_sum[window_size:] - cum_sum[:-window_size]) / window_size

Having tested this against your own code, the above is significantly faster (increasing with the size of the input array), and slightly faster than the solution provided by jdehesa. With an input array of 1000x1000, it is two orders of magnitude faster than your solution and one order of magnitude faster than jdehesa's.

score 1 · Answer 3 · answered Oct 31 '18 at 12:01

1

Try this:

import numpy as np     
array_a = np.random.random((100,100))
vector_a = [np.mean(array_a[:,i+20:i+40]) for i in range(10)]

answered Oct 31 '18 at 12:01

Rudolf Morkovskyi

869
5
19

This is essentially the same, but shorter. It is not going to help in terms of efficiency. – jdehesa Oct 31 '18 at 12:06
@jdehesa Well, actually a little more efficient) – Rudolf Morkovskyi Oct 31 '18 at 12:08

A tedious loop looking for improvements

3 Answers3