Speed up numpy array assignment via list comprehension or mapping?

Question

I'm trying to add the values of a small 2D numpy array ("source") into a larger 2D numpy array ("frame"), starting at a specific position in the frame-array ("pos_x" , "pos_y"). Right now, I have two for-loops adding the source-value to the frame-value at each position:

for i in range(x):
    for j in range(y):
        frame[pos_x+i][pos_y+j] += source[i][j]

("x" and "y" being the source-arrays' shape)

However, the arrays are quite large (the frame array shape: 5000x8000, and the source array shape: 1000x5000). So this process takes quite long (ca. 15 seconds).

Is there any way to speed up this process, either through list comprehension, or mapping, or anything else?

I've tried list comprehension like this with multiple statements and assignments:

frame = [[frame[pos_x+i][pos_y+j] + source[i][j] for j in range(y)] for i in range(x)]

(adapted from the Threads: How can I do assignments in a list comprehension? and Multiple statements in list compherensions in Python?)

but it takes just as long as the original for-loops.

Another idea was to only allow the loop for non-zero values with if source[i][j] != 0. But when I tried that, it took over three times as long (potential sub-question: any idea why?).

I think there is a typo in your posted code, in the second code block, the second range should be `range(x)` instead of `range(xs)`, right? — Hemerson Tacon, Sep 26 '18 at 13:59
true, thanks for that (those were my old variable names in the larger program). — YohanneSaurus, Sep 26 '18 at 14:04

score 2 · Accepted Answer · answered Sep 26 '18 at 14:09

2

You can take advantage of numpy vectorization here instead of looping for a huge speedup. It's just a matter of calculating the indices and checking if you're out of bounds.

height, width = source.shape
pos_x2 = pos_x + width
pos_y2 = pos_y + height
#check for possible index out of range
fheight, fwidth = frame.shape
if pos_x2 > fwidth or pos_y2 > fheight:
    print('source out of frame bounds')
else:
    #add `source` to our slice of `frame`
    frame[pos_y:pos_y2, pos_x:pos_x2] += source

While technically this will have the same time complexity, numpy uses efficient compiled C code that can take advantage of things like hardware vectorization and doesn't have all the overhead of the python interpreter.

answered Sep 26 '18 at 14:09

Aaron

10,133
1
24
40

Thanks you so much! It works much much better now! Is there possibly any book or website or anything that you know of where I could learn more about optimizing code in ways like this (especially concerning memory allocation, extending with c, etc)? – YohanneSaurus Sep 27 '18 at 10:11
@YohanneSaurus you'll never have to explicitly worry about memory allocation or anything along those lines, as numpy (and other libraries like it) takes care of it for you. Python often gets a bad reputation for being slow because it has a lot of overhead. You can write decently fast code, but sometimes it takes knowing what is actually going on under the hood to understand why some things are slower than others. I don't have a "silver bullet" tutorial or book, but I generally never worry about code that only runs once, but if there's a big loop (or nested loop) I try to find a faster library. – Aaron Sep 27 '18 at 14:50

score 1 · Answer 2 · answered Sep 26 '18 at 14:08

1

slices are mutable in numpy; so you can do things like:

import numpy as np

A = np.zeros((10,10), int)
B = np.zeros((5,5), int) + 5

A[2:7,2:7] += B

print(A)

answered Sep 26 '18 at 14:08

Sam Mason

15,216
1
41
60

Speed up numpy array assignment via list comprehension or mapping?

2 Answers2