0

I encountered a very slow if statement response using cuda\jacket in matlab. (5 sec vs 0.02 sec for the same code that finds local maxima, using a simple for loop and an if condition)

Being new to GPU programming, I went reading and when I saw a previous matlab if statements with CUDA SO discussion, I felt something is missing. You don't need to use cuda to know that it is better to vectorized your code. However, there are cases where you will need to use an if statement anyway. For example, I'd like to find whether a pixel of a 2D image (say m(a,b)) is the the local maximum of its 8 nearest neighbors. In matlab, an easy way to do that is by using 8 logical conditions on an if statement:

if m(a,b)>m(a-1,b-1) & m(a,b)>(a,b-1) & m(a,b)>(a+1,b-1) & ... etc on all nearest neighbors

I'd appreciate if you have an idea how to resolve (or vectorize) this...

Community
  • 1
  • 1
bla
  • 25,846
  • 10
  • 70
  • 101
  • I'm not entirely sure what you want to vectorize here - you already have the local maxima function within matlab that essentially is the vectorized version you're looking for... or did you want to just vectorize the 8 `if` boolean conditions into 1? In that case see my answer... – jmetz Aug 06 '12 at 18:22

2 Answers2

2

The problem with using multiple "if" statement (or any other conditional statement) is that for each the statements, the result is copied from gpu to host and this can be costly.

The simplest way is to vectorize in the following manner.

window = m(a-1:a+1, b-1:b+1);
if all(window(:) <= m(a,b))
% do something
end

This can be further optimized if you can show what the if / else conditions are doing. i.e. please post the if/else code to see if other optimizations are available (i.e look at possible ways to remove if condition entirely).

EDIT

With new information, here is what can be done.

for j = 1:length(y)
 a = x(j);
 b = y(j);
 window = d(a-1:a+1, b-1:b+1);
 condition = all(window(:) <= d(a,b));
 M(a, b) = condition + ~condition * M(a,b);
end

You can use gfor loop to make it even faster.

gfor j = 1:length(y)
 a = x(j);
 b = y(j);
 window = d(a-1:a+1, b-1:b+1);
 condition = all(window(:) <= d(a,b));
 M(a, b) = condition + ~condition * M(a,b);
gend
Pavan Yalamanchili
  • 12,021
  • 2
  • 35
  • 55
  • here is the code I use: 'for j=1:length(y) if (d(x(j),y(j))>d(x(j)-1,y(j))) &&... (d(x(j),y(j))>d(x(j),y(j)-1)) &&... (d(x(j),y(j))>=d(x(j)+1,y(j))) &&... (d(x(j),y(j))>=d(x(j),y(j)+1)) && ... (d(x(j),y(j))>d(x(j)-1,yy(j)-1)) && ... (d(x(j),y(j))>d(x(j)-1,y(j)+1)) && ... (d(x(j),y(j))>=d(x(j)+1,y(j)-1)) && ... (d(x(j),y(j))>=d(x(j)+1,y(j)+1)); M(x(j),y(j))=1; end end' – bla Aug 11 '12 at 04:21
  • Thanks, I've tried gfor, and got matlab to crash... I found that casting back to cpu variables (i.e. single) is the fastest way thus far. Before that code segment I use the gpu to medfilt2 and filter2, and after casting to single i use find and the for loop. On a different note .I wish gfor supported find... (there should be a fast way to find nonzero matrix elements) – bla Aug 13 '12 at 23:58
  • I managed to get gfor to work. Apparently, the line 'window=d(a-1:a+1, b-1:b+1)' caused Matlab to crash. Instead I just explicitly wrote window(:), but I don't get any improvement over the cpu code. I suppose it is because d is a sparse 1024x1024 matrix? – bla Aug 15 '12 at 17:28
  • @nate sparse matrix indexing has limited support. It may be the reason you saw the crash. – Pavan Yalamanchili Aug 26 '12 at 01:36
1

Using built-in functions

The easiest already optimized approach is probably to use the imregionalmax function,

maxinI = imregionalmax(I, CONN); 

where CONN is the desired connectivity (in your case 8).

Note however that imregionalmax is part of the image processing toolbox.

Using the max function

If you're trying to see if just that one pixel is the local maximum of it's neighbors you would probably do something like

if  m(a,b) == max(max(m( (a-1) : (a+1), (b-1) : (b+1))))

Or perhaps rather than taking two max it may be faster in some cases to reshape,

if  m(a,b) == max(reshape (m( (a-1) : (a+1), (b-1) : (b+1)), 9,1)  )

Without the max function

Lastly if you want to avoid the max function altogether that is also possible in a more vectorized form than you have so far, namely

if  all(reshape( m(a,b) >= m( (a-1) : (a+1), (b-1) : (b+1)), 9,1))
jmetz
  • 12,144
  • 3
  • 30
  • 41
  • thanks for the response. Matlab-wise, though the code you suggested is more elegant, I found that writing the full 8 conditions in a single if line is always faster (by factors of 1.8 to 2.6) My problem is that any for loop of size greater than ~ 1e4 will slow the code to a level I can;t work with. So I guess I ask whether a local maxima search can be done without a for loop... – bla Aug 06 '12 at 20:50
  • @wavepacket: I updated my answer to include Matlab's builtin regional maxima function. – jmetz Aug 06 '12 at 20:53
  • Thanks!! I wasn't aware of imregionalmax... unfortunately it is 6 times slower than the for loop with an if condition. I'll look into its machinery and see how it can be utilized with cuda. Again thanks. – bla Aug 06 '12 at 21:58