Image Processing: Algorithm taking too long in MATLAB

Question

I am working in MATLAB to process two 512x512 images, the domain image and the range image. What I am trying to accomplish is the following:

Divide both domain and range images into 8x8 pixel blocks
For each 8x8 block in the domain image, I have to apply a linear transformations to it and compare each of the 4096 transformed blocks with each of the 4096 range blocks.
Compute error in each case between the transformed block and the range image block and find the minimum error.
Finally I'll have for each 8x8 range block, the id of the 8x8 domain block for which the error was minimum (error between the range block and the transformed domain block)

To achieve this, I have written the following code:

RangeImagecolor = imread('input.png'); %input is 512x512
DomainImagecolor = imread('input.png'); %Range and Domain images are identical

RangeImagetemp = rgb2gray(RangeImagecolor);
DomainImagetemp = rgb2gray(DomainImagecolor);

RangeImage = im2double(RangeImagetemp);

DomainImage = im2double(DomainImagetemp);
%For the (k,l)th 8x8 range image block
for k = 1:64
    for l = 1:64
        minerror = 9999;
        min_i = 0;
        min_j = 0;

        for i = 1:64
            for j = 1:64

                 %here I compute for the (i,j)th domain block, the transformed domain block stored in D_trans

                 error = 0;
                 D_trans = zeros(8,8);
                 R = zeros(8,8); %Contains the pixel values of the (k,l)th range block
                 for m = 1:8
                     for n = 1:8
                         R(m,n) = RangeImage(8*k-8+m,8*l-8+n); 
                         %ApplyTransformation can depend on (k,l) so I can't compute the transformation outside the k,l loop.
                         [m_dash,n_dash] = ApplyTransformation(8*i-8+m,8*j-8+n);
                         D_trans(m,n) = DomainImage(m_dash,n_dash);
                         error = error + (R(m,n)-D_trans(m,n))^2; 
                     end 
                 end
                 if(error < minerror)
                     minerror = error; 
                     min_i = i;
                     min_j = j;
                 end
            end
        end 
    end
end

As an example ApplyTransformation, one can use the identity transformation:

 function [x_dash,y_dash] = Iden(x,y)
    x_dash = x;
    y_dash = y;
end

Now the problem I am facing is the high computation time. The order of computation in the above code is 64^5, which is of the order 10^9. This computation should take at the worst minutes or an hour. It takes about 40 minutes to compute just 50 iterations. I don't know why the code is running so slow.

Thanks for reading my question.

Have you tried profiling? It takes 42s per single run of outermost loop for me, nearly half of time is taken by this Iden function. It is highly likely this ApplyTransformation is quite demanding, and you do TONS of iterations. — Zizy Archer, May 29 '18 at 08:28
Yes I have. The ApplyTransformation takes too much time. How can I rectify the situation? — Vizag, May 29 '18 at 08:30
Or am I doing redundant iterations? Is there a way around to reduce the number of iterations I have to do to accomplish the same thing. That'd be great if I can do that. — Vizag, May 29 '18 at 08:31

rahnema1 · Accepted Answer · 2018-06-04T22:49:03.303

6

You can use im2col* to convert the image to column format so each block forms a column of a [64 * 4096] matrix. Then apply transformation to each column and use bsxfun to vectorize computation of error.

DomainImage=rand(512);
RangeImage=rand(512);
DomainImage_col = im2col(DomainImage,[8 8],'distinct');
R =  im2col(RangeImage,[8 8],'distinct');
[x y]=ndgrid(1:8);

function [x_dash, y_dash] = ApplyTransformation(x,y)
    x_dash = x;
    y_dash = y;
end

[x_dash, y_dash] = ApplyTransformation(x,y);
idx = sub2ind([8 8],x_dash, y_dash);

D_trans = DomainImage_col(idx,:);       %transformation is reduced to matrix indexing
Error = 0;
for mn = 1:64
    Error = Error + bsxfun(@minus,R(mn,:),D_trans(mn,:).').^2;
end
[minerror ,min_ij]= min(Error,[],2);         % linear index of minimum of each block;
[min_i  min_j]=ind2sub([64 64],min_ij); % convert linear index to subscript

Explanation:

Our goal is to reduce number of loops as much as possible. For it we should avoid matrix indexing and instead we should use vectorization. Nested loops should be converted to one loop. As the first step we can create a more optimized loop as here:

min_ij = zeros(4096,1);

for kl = 1:4096                 %%% => 1:size(D_trans,2)
    minerror = 9999;
    min_ij(kl) = 0;
    for ij = 1:4096             %%% => 1:size(R,2)
        Error = 0;
        for mn = 1:64
            Error = Error + (R(mn,kl) - D_trans(mn,ij)).^2;
        end
        if(Error < minerror)
            minerror = Error; 
            min_ij(kl) = ij;
        end
    end
end

We can re-arrange the loops and we can make the most inner loop as the outer loop and separate computation of the minimum from the computation of the error.

% Computation of the error
Error = zeros(4096,4096);
for mn = 1:64
    for kl = 1:4096
        for ij = 1:4096
            Error(kl,ij) = Error(kl,ij) + (R(mn,kl) - D_trans(mn,ij)).^2;
        end
    end
end

% Computation of the min
min_ij = zeros(4096,1);
for kl = 1:4096
    minerror = 9999;
    min_ij(kl) = 0;
    for ij = 1:4096
        if(Error(kl,ij) < minerror)
            minerror = Error(kl,ij); 
            min_ij(kl) = ij;
        end
    end
end

Now the code is arranged in a way that can best be vectorized:

Error = 0;
for mn = 1:64
    Error = Error + bsxfun(@minus,R(mn,:),D_trans(mn,:).').^2;
end

[minerror ,min_ij] = min(Error, [], 2);   
[min_i  ,min_j] = ind2sub([64 64], min_ij);

*If you don't have the Image Processing Toolbox a more efficient implementation of im2col can be found here.

*The whole computation takes less than a minute.

edited Jun 04 '18 at 22:49

answered May 29 '18 at 09:47

rahnema1

15,264
3
15
27

Thank you for your answer. I'll try your approach and get back to you ASAP. – Vizag May 29 '18 at 09:49
Glad to help..! – rahnema1 May 29 '18 at 15:21
1

I wish they had a super upvote button. This answer should get one! – Vizag May 29 '18 at 17:31
One doubt. I am not able to figure out the functioning of bsxfun, i.e. I am not able to arrive at the conclusion that what I am doing is taking the distortion(sum of squared distance) between all the 8x8 range blocks (4096) and all the 8x8 transformed domain blocks, from the code. – Vizag Jun 04 '18 at 20:44
@Vizag I have added some explanations also computation of min should be done in the second dimension that I have corrected the answer to `min(Error,[],2)`. – rahnema1 Jun 04 '18 at 22:50
Why does the `minerror` matrix have any non-zero values when I run the code? Because the minimum error in case of the identity transformation should be 0. The (k,l)th range block is coincidental with the (k,l)th (identity transformed) domain block. – Vizag Jun 05 '18 at 14:47
You have defined error as `(R-D_trans)^2`. Since `R` and `D_trans` are two different arrays so it is expected to error will be greater than 0. If you define error as `(DomainImage_col - D_trans)^2` in the case of the identity transform the error will be 0. – rahnema1 Jun 05 '18 at 16:50
The error in my question is : `(D(m,n) - R(m,n))^2` where `m,n = 1,2,...,8` are the pixel values of the 8x8 Domain and Range block. So I should be getting a zero, if I define it that way (which is what you mention towards the end). – Vizag Jun 05 '18 at 18:40
Are you agree that D and R are two different things? Why the min error should be 0? Please provide a [mcve](https://stackoverflow.com/help/mcve) so we can reproduce your results. – rahnema1 Jun 05 '18 at 18:59
Because as mentioned in the question, the domain and range images (512x512) are identical. So when i apply the identity transformation on each of the 4096 8x8 domain blocks to obtain `D_trans` (use definition mentioned in the question) nothing will change. And since the domain and range images are identical, the min error in this case must be 0. – Vizag Jun 05 '18 at 19:14
When I set `RangeImage=DomainImage;` I will get all minerrors as 0; – rahnema1 Jun 05 '18 at 19:31
Exactly. all the entries in minerror must be 0. But using the formula you had provided in the answer for computing error I wasn't getting that and that's why i asked. – Vizag Jun 05 '18 at 19:38
I tested it in octave you can [Try it online!](https://tio.run/##lZDBboMwDIbveQpfJoIWVSvqqqoIaZO6w67VboihlKQ0G0mqECp4emag3dppl@USO/78@49t4flJ9v3Gaq7Mq@alTBw3gj7OozAmW25KOb1eETG5SvLCVpCA0hEG9KrA0hWsMhYIVXtlCh8Mekhe0B/tv8i0hS5LjCidEnS@XuET2TdYVNZA2uaC1wcG3XhnqPp8PFbdG1qv99ZpPmC0ZV1IAM@EI9XGY95d8i4m0ghC/iEYEyVarNfNLlK4qMn7bf9gdpP7oRnJX8ui2M/WYQzTufM3M0DV4KRoCinAW8BHp1rAQbJVpiQvzlmHmg8xwRbQBuP5erkYvzUV73HFu7rFZdEnrUxTsy3VBieys6UpmwXh7D2a/p8iJ8dmhlGuPrIEMKCjIEszFn3bRcNQIc3dZArsfkCVbvQQSl4cYFfZ4jMeVXMFQzlHRcQjXBpNlwtYLrLzJBS@g8Kak3T@Vhh/j3hdOHX0/cXgLOi/AA "Octave – Try It Online") . – rahnema1 Jun 05 '18 at 19:41
Have a look at line 1 and 2 where I set `RangeImage=DomainImage;` – rahnema1 Jun 05 '18 at 19:42
Thanks @rahnema1. I got my 0's too. – Vizag Jun 05 '18 at 19:55
1

I am putting your name (handle actually, because I don't know your name) in my thesis acknowledging your help. :) – Vizag Jul 22 '18 at 09:30

Zizy Archer · Answer 2 · 2018-05-29T10:47:00.957

First things first - your code doesn't do anything. But you likely do something with this minimum error stuff and only forgot to paste this here, or still need to code that bit. Never mind for now.

One big issue with your code is that you calculate transformation for 64x64 blocks of resulting image AND source image. 64^5 iterations of a complex operation are bound to be slow. Rather, you should calculate all transformations at once and save them.

allTransMats = cell(64);
for i = 1 : 64 
   for j = 1 : 64
      allTransMats{i,j} = getTransformation(DomainImage, i, j)
   end
end
function D_trans = getTransformation(DomainImage, i,j)
D_trans = zeros(8);
for m = 1 : 8
   for n = 1 : 8
      [m_dash,n_dash] = ApplyTransformation(8*i-8+m,8*j-8+n);
      D_trans(m,n) = DomainImage(m_dash,n_dash);
   end
end
end

This serves to get allTransMat and is OUTSIDE the k, l loop. Preferably as a simple function.

Now, you make your big k, l, i, j loop, where you compare all the elements as needed. Comparison could be also done block-wise instead of filling a small 8x8 matrix, yet doing it per element for some reason.

m = 1 : 8;
n = m;
for ...
    R = RangeImage(...); % This will give 8x8 output as n and m are vectors.
    D = allTransMats{i,j};
    difference = sum(sum((R-D).^2));
    if (difference < minDifference) ...
end

Even though this is a simple no transformations case, this speeds up code a lot.

Finally, are you sure you need to compare each block of transformed output with each block in the source? Typically you compare block1(a,b) with block2(a,b) - blocks (or pixels) on the same position.

EDIT: allTransMats requires k and l too. Ouch. There is NO WAY to make this fast for a single iteration, as you require 64^5 calls to ApplyTransformation (or a vectorization of that function, but even then it might not be fast - we would have to see the function to help here). Therefore, I will re-iterate my advice to generate all transformations and then perform lookup: this upper part of the answer with allTransMats generation should be changed to have all 4 loops and generate allTransMats{i,j,k,l};. It WILL be slow, there is no way around that as I mentioned in the upper part of edit. But, it is a cost you pay once, as after saving the allTransMats, all further image analyses will be able to simply load it instead of generating it again.

But ... what do you even do? Transformation that depends on source and destination block indices plus pixel indices (= 6 values total) sounds like a mistake somewhere, or a prime candidate to optimize instead of all the rest.

The thing is that the ApplyTransformation can depend on (k,l), so I'll have to keep it inside the loop. Thanks for your answer. — Vizag, May 29 '18 at 09:15
The thing I am trying to do is to obtain the min_i, min_j variables as mentioned above in my code, — Vizag, May 29 '18 at 09:16

Image Processing: Algorithm taking too long in MATLAB

2 Answers2