Speed up convolution computation code

Question

I have the following code in MATLAB

[Mx,Nx] = size(x);
[My,Ny] = size(y);

padded_x = zeros(Mx+2*(My-1),Nx+2*(Ny-1));
padded_x(My:Mx+My-1,Ny:Ny+Nx-1) = x; 
y = rot90(y,2);

z = zeros(Mx+My-1,Nx+Ny-1);
for i=1:Mx+My-1
    for j=1:Nx+Ny-1
        z(i,j) = sum(sum(padded_x(i:i+My-1,j:j+Ny-1).*y));
    end
end

that is part of a 2D convolution implementation. Is there any way that this can become faster? e.g. vectorizing these 2 for loops? I know that there are faster algorithms that compute 2D convolution, but I want to speed this one up. So, I am not looking for an algorithm with different complexity, just something with lower complexity constant. I would also like to keep this in MATLAB and not to use MEX - files etc. Finally, the supplied conv2 function is also not the solution I am looking for.

no, sorry I forgot to mention that! Actually conv2 uses FFT, so I thought it was obvious that it belongs to the "different algorithm" category — Controller, Apr 24 '15 at 11:02
@Controller I was not aware that `conv2` used fft. It is not very easy to find information about that either. However, does it matter if it does? — patrik, Apr 24 '15 at 11:21
@patrik Well, yes it does in this case, because it leads to a different complexity. The concept is to see how these two algorithms (mine and conv2) scale with the size of inputs. I am just trying to find out if there can be a faster implementation of my algorithm in order to better approximate the difference between the two algorithms. — Controller, Apr 24 '15 at 11:32

score 4 · Accepted Answer · edited May 23 '17 at 12:30

4

For each iteration, you can replace the elementwise multiplication and double summations with a fast matrix multiplication.

That is -

z(i,j) = sum(sum(padded_x(i:i+My-1,j:j+Ny-1).*y));

would be replaced by -

M = padded_x(i:i+My-1,j:j+Ny-1);
z(i,j) = M(:).'*y(:);

Thus, the loopy portion of the original code could be replaced by -

z = zeros(Mx+My-1,Nx+Ny-1);
yr = y(:);
for i=1:Mx+My-1
    for j=1:Nx+Ny-1
         M = padded_x(i:i+My-1,j:j+Ny-1);
        z(i,j) = M(:).'*yr;
    end
end

Quick tests: With x and y as 200 x 200 each, the runtimes were -

------------------------- With Original Approach
Elapsed time is 10.357977 seconds.
------------------------- With Proposed Approach
Elapsed time is 5.209822 seconds.

edited May 23 '17 at 12:30

Community

1
1

answered Apr 24 '15 at 11:57

Divakar

218,885
19
262
358

Could do z(i,j) =yr * M(:); in the inner loop instead if you do yr=y(:).' upfront to squeeze just a bit more maybe – hiandbaii Apr 24 '15 at 14:11
@hiandbaii Ah good observation there! OP must consider that hack too! – Divakar Apr 24 '15 at 14:12

Speed up convolution computation code

1 Answers1