1

I have the following code in MATLAB

[Mx,Nx] = size(x);
[My,Ny] = size(y);

padded_x = zeros(Mx+2*(My-1),Nx+2*(Ny-1));
padded_x(My:Mx+My-1,Ny:Ny+Nx-1) = x; 
y = rot90(y,2);

z = zeros(Mx+My-1,Nx+Ny-1);
for i=1:Mx+My-1
    for j=1:Nx+Ny-1
        z(i,j) = sum(sum(padded_x(i:i+My-1,j:j+Ny-1).*y));
    end
end

that is part of a 2D convolution implementation. Is there any way that this can become faster? e.g. vectorizing these 2 for loops? I know that there are faster algorithms that compute 2D convolution, but I want to speed this one up. So, I am not looking for an algorithm with different complexity, just something with lower complexity constant. I would also like to keep this in MATLAB and not to use MEX - files etc. Finally, the supplied conv2 function is also not the solution I am looking for.

Divakar
  • 218,885
  • 19
  • 262
  • 358
Controller
  • 489
  • 7
  • 27
  • 4
    and you don't want to use the supplied `conv2` function? – gauteh Apr 24 '15 at 10:59
  • no, sorry I forgot to mention that! Actually conv2 uses FFT, so I thought it was obvious that it belongs to the "different algorithm" category – Controller Apr 24 '15 at 11:02
  • @Controller I was not aware that `conv2` used fft. It is not very easy to find information about that either. However, does it matter if it does? – patrik Apr 24 '15 at 11:21
  • @patrik Well, yes it does in this case, because it leads to a different complexity. The concept is to see how these two algorithms (mine and conv2) scale with the size of inputs. I am just trying to find out if there can be a faster implementation of my algorithm in order to better approximate the difference between the two algorithms. – Controller Apr 24 '15 at 11:32
  • What are the typical datasizes for `x` and `y`? – Divakar Apr 24 '15 at 11:47
  • @Divakar they are up to 1000x1000 – Controller Apr 24 '15 at 11:50
  • Both x and y are `1000 x 1000`? – Divakar Apr 24 '15 at 11:51
  • @Divakar yes, both of them – Controller Apr 24 '15 at 11:52

1 Answers1

4

For each iteration, you can replace the elementwise multiplication and double summations with a fast matrix multiplication.

That is -

z(i,j) = sum(sum(padded_x(i:i+My-1,j:j+Ny-1).*y));

would be replaced by -

M = padded_x(i:i+My-1,j:j+Ny-1);
z(i,j) = M(:).'*y(:);

Thus, the loopy portion of the original code could be replaced by -

z = zeros(Mx+My-1,Nx+Ny-1);
yr = y(:);
for i=1:Mx+My-1
    for j=1:Nx+Ny-1
         M = padded_x(i:i+My-1,j:j+Ny-1);
        z(i,j) = M(:).'*yr;
    end
end

Quick tests: With x and y as 200 x 200 each, the runtimes were -

------------------------- With Original Approach
Elapsed time is 10.357977 seconds.
------------------------- With Proposed Approach
Elapsed time is 5.209822 seconds.
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358