Rigid transformation - Python - speedup

Question

I have following question about faster way to compute rigid transformation (yes, I know I can simply use library but need to code this by myself).

I need to compute x' and y' for every x,y in given image. My main bottleneck is dot product for all coordinates (interpolation after this is not a problem). Currently I implemented three options:

list comprehension

result = np.array([[np.dot(matrix, np.array([x, y, 1])) for x in xs] for y in ys])

simple double-for loop

result = np.zeros((Y, X, 3))
for x in xs:
    for y in ys:
        result[y, x, :] = np.dot(matrix, np.array([x, y, 1]))

np.ndenumerate

result = np.zeros((Y, X, 3))
for (y, x), value in np.ndenumerate(image):
    result[y, x, :] = np.dot(matrix, np.array([x, y, 1]))

The fastest way in 512x512 images is list comprehension (about 1.5x faster than np.ndenumerate and 1.1x faster than double for loop).

Is there any way to do this faster?

Eric · Answer 1 · 2016-11-24T19:44:44.303

3

You can use np.indices and np.rollaxis to generate a 3D array, where coords[i, j] == [i, j]. Here the coordinates need switching

Then all you do is append the 1 you ask for, and use @

coords_ext = np.empty((Y, X, 3))
coords_ext[...,[1,0]] = np.rollaxis(np.indices((Y, X)), 0, start=3)
coords_ext[...,2] = 1

# convert to column vectors and back for matmul broadcasting
result = (matrix @ coords_ext[...,None])[...,0]

# or alternatively, work with row vectors and do it in the other order
result = coords_ext @ matrix.T

edited Nov 24 '16 at 19:44

answered Nov 24 '16 at 16:52

Eric

95,302
53
242
374

Had to make few changes to your posted code to match up the output values with the OP's double loop approach as needed for value verification under the runtime test. Hope they look okay. – Divakar Nov 24 '16 at 18:12
@Divakar: How do my fixes look? – Eric Nov 24 '16 at 19:44
Awesome, thanks! Updated the profiling tests. Some improvement there on the timings with these fixes! – Divakar Nov 24 '16 at 19:58

score 3 · Accepted Answer · edited May 23 '17 at 11:52

Borrowing from this post, the idea of creating 1D arrays instead of the 2D or 3D meshes and using them with on-the-fly broadcasted operations for memory efficiency and thus achieve performance benefits, here's an approach -

out = ys[:,None,None]*matrix[:,1] + xs[:,None]*matrix[:,0] + matrix[:,2]

If you are covering all indices with xs and ys for the 512x512 sized images, we would have them created with np.arange, like so -

ys = np.arange(512)
xs = np.arange(512)

Runtime test

Function definitions -

def original_listcomp_app(matrix, X, Y): # Original soln with list-compr. 
    ys = np.arange(Y)
    xs = np.arange(X)
    out = np.array([[np.dot(matrix, np.array([x, y, 1])) for x in xs] \
                                                           for y in ys])
    return out    

def indices_app(matrix, X, Y):        # @Eric's soln  
    coords_ext = np.empty((Y, X, 3))
    coords_ext[...,[1,0]] = np.rollaxis(np.indices((Y, X)), 0, start=3)
    coords_ext[...,2] = 1    
    result = np.matmul(coords_ext,matrix.T)
    return result

def broadcasting_app(matrix, X, Y):  # Broadcasting based
    ys = np.arange(Y)
    xs = np.arange(X)
    out = ys[:,None,None]*matrix[:,1] + xs[:,None]*matrix[:,0] + matrix[:,2]
    return out

Timings and verification -

In [242]: # Inputs
     ...: matrix = np.random.rand(3,3)
     ...: X,Y = 512, 512
     ...: 

In [243]: out0 = original_listcomp_app(matrix, X, Y)
     ...: out1 = indices_app(matrix, X, Y)
     ...: out2 = broadcasting_app(matrix, X, Y)
     ...: 

In [244]: np.allclose(out0, out1)
Out[244]: True

In [245]: np.allclose(out0, out2)
Out[245]: True

In [253]: %timeit original_listcomp_app(matrix, X, Y)
1 loops, best of 3: 1.32 s per loop

In [254]: %timeit indices_app(matrix, X, Y)
100 loops, best of 3: 16.1 ms per loop

In [255]: %timeit broadcasting_app(matrix, X, Y)
100 loops, best of 3: 9.64 ms per loop

Good spot that this could be done with broadcasting. I'd often write `xs[None,:,None]` here for consistency with `ys[:,None,None]`, even if it is unecessary. Glad my solution proved to be at least faster than the list comp - thanks for profiling! — Eric, Nov 24 '16 at 19:46
Thank you, both solutions work sufficiently fast - it's even faster than scikit-image built-in transformation (but still a bit slower than scipy). — Nefarin, Nov 25 '16 at 10:31
@Nefarin Wow, didn't know Scipy has something to do this task. Nice! — Divakar, Nov 25 '16 at 10:33

Rigid transformation - Python - speedup

2 Answers2