List comprehension with nested for loop, conditional, and accumulator

Question

I am trying to convert this piece of code into a list comprehension:

a = np.random.rand(10) #input vector
n = len(a) # element count of input vector
b = np.random.rand(3) #coefficient vector
nb = len(b) #element count of coefficients
d = nb #decimation factor (could be any integer < len(a))
 
c = []
for i in range(0, n, d):
    psum = 0
    for j in range(nb):
        if i + j < n:
            psum += a[i + j]*b[j]
    c.append(psum)

I've tried following suggestions from:

For example:

from itertools import accumulate
c = [accumulate([a[i + j] * b[j] for j in range(nb) if i + j < n] ) for i in range(0, n, d)]

Later, when trying to get values from c (e.g. c[:index]):

TypeError: 'NoneType' object is not subscriptable

Or:

from functools import partial
def get_val(a, b, i, j, n):
    if i + j < n:
        return(a[i + j] * b[j])
    else:
        return(0)
c = [
         list(map(partial(get_val, i=i, j=j, n=n), a, b)) 
             for i in range(0, n, d) 
             for j in range(nb)
    ]

in get_val, return(a[i + j] * b[j])

IndexError: invalid index to scalar variable.

Or:

psum_pieces = [[a[i + j] * b[j] if i + j < n else 0 for j in range(nb)] for i in range(0, n, d)]
c = [sum(psum) for psum in psum_pieces]

As well as many other iterations of these approaches. Any guidance would be much appreciated.

Why are you mixing list comprehensions with numpy? Do you want a numpy solution? Can you add a numpy tag in that case? — Mad Physicist, Aug 05 '20 at 21:40
Please copy and paste your error messages properly, and format them as code. — Mad Physicist, Aug 05 '20 at 21:41
Why do you want to rewrite it as list comprehension? Is this a homework assignment? — Michael Butscher, Aug 05 '20 at 21:43
@MichaelButscher I don't necessarily want to write it as a list comprehension. This is not a homework assignment. I am porting FORTRAN software to Python and when running a profiler found that this piece of code was incredibly inefficient. I believe I can speed up runtime by predefining the output array since I can determine its size beforehand and then just assigning values directly to indexes in the output array (avoiding using append). I was interested in learning if this was possible to do in a list comprehension and timing to see if it will be faster than other methods. — N. Dryas, Aug 05 '20 at 21:50
A comprehension works on arbitrary iterabls, so generally can't preallocate — Mad Physicist, Aug 05 '20 at 22:03

score 1 · Answer 1 · answered Aug 05 '20 at 21:47

If I've understood correctly what you want is something like

res = [sum(a[i+j]*b[j] for j in range(nb) if i+j < n) for i in range(0,n,d)]

For each i, this will add to the resulting list the sum of the products a[i+j]*b[j] for j that varies from 0 to nb-1 when i+j < n

Mad Physicist · Accepted Answer · 2020-08-05T22:07:41.287

1

You really don't need to be using a list comprehension here. With numpy, you can create a fast pipelined solution that does not run any loops directly in the interpreter.

First convert a into a 2D array shaped (n // d, nb). The missing elements (i.e., where i + j >= n in the loop) can be zero since that will make the corresponding increment to psum zero:

# pre-compute i+j as a 2D array
indices = np.arange(nb) + np.arange(0, n, d)[:, None]
# we only want valid locations
mask = indices < n

t = np.zeros(indices.shape)
t[mask] = a[indices[mask]]

Now you can compute c directly as

(t * b).sum(axis=1)

I suspect that if you benchmark this solution against anything written in vanilla python not compiled with numba, it will be much faster.

edited Aug 05 '20 at 22:07

answered Aug 05 '20 at 22:01

Mad Physicist

107,652
25
181
264

@N.Dryas. Your bottleneck is likely the overhead of running this 3k times, not the function itself. See if you can vectorize that. In the meantime, don't forget to select an answer by clicking on the check mark next to it. – Mad Physicist Aug 05 '20 at 22:41
Thank you @MadPhysicist. This was much faster than using a list comprehension. Original runtime with the nested for loops function called 3072 times on input arrays (a) ranging in size 4600-5200 elements and coefficient array (b) length = 3 was 53.3 seconds. With the list comprehension, runtime was 48.6 seconds. This numpy solution clocked in at 30.1 seconds. Thank you for the additional advice to vectorize the function call. I will try to implement that now. – N. Dryas Aug 05 '20 at 22:50

List comprehension with nested for loop, conditional, and accumulator

2 Answers2