Vectorization of matlab code

Question

i'm kinda new to vectorization. Have tried myself but couldn't. Can somebody help me vectorize this code as well as give a short explaination on how u do it, so that i can adapt the thinking process too. Thanks.

function [result] = newHitTest (point,Polygon,r,tol,stepSize)
%This function calculates whether a point is allowed.

%First is a quick test is done by calculating the distance from point to 
%each point of the polygon. If that distance is smaller than range "r", 
%the point is not allowed. This will slow down the algorithm at some 
%points, but will greatly speed it up in others because less calls to the 
%circleTest routine are needed.
polySize=size(Polygon,1);
testCounter=0;

for i=1:polySize
d = sqrt(sum((Polygon(i,:)-point).^2));

if d < tol*r
    testCounter=1;
    break
end
end

if testCounter == 0
circleTestResult = circleTest (point,Polygon,r,tol,stepSize);
testCounter = circleTestResult;
end

result = testCounter;

score 7 · Accepted Answer · edited May 23 '17 at 11:48

Given the information that Polygon is 2 dimensional, point is a row vector and the other variables are scalars, here is the first version of your new function (scroll down to see that there are lots of ways to skin this cat):

function [result] = newHitTest (point,Polygon,r,tol,stepSize)
result = 0;
linDiff = Polygon-repmat(point,size(Polygon,1),1);
testLogicals = sqrt( sum( ( linDiff ).^2 ,2 )) < tol*r;    
if any(testLogicals); result = circleTest (point,Polygon,r,tol,stepSize); end

The thought process for vectorization in Matlab involves trying to operate on as much data as possible using a single command. Most of the basic builtin Matlab functions operate very efficiently on multi-dimensional data. Using for loop is the reverse of this, as you are breaking your data down into smaller segments for processing, each of which must be interpreted individually. By resorting to data decomposition using for loops, you potentially loose some of the massive performance benefits associated with the highly optimised code behind the Matlab builtin functions.

The first thing to think about in your example is the conditional break in your main loop. You cannot break from a vectorized process. Instead, calculate all possibilities, make an array of the outcome for each row of your data, then use the any keyword to see if any of your rows have signalled that the circleTest function should be called.

NOTE: It is not easy to efficiently conditionally break out of a calculation in Matlab. However, as you are just computing a form of Euclidean distance in the loop, you'll probably see a performance boost by using the vectorized version and calculating all possibilities. If the computation in your loop were more expensive, the input data were large, and you wanted to break out as soon as you hit a certain condition, then a matlab extension made with a compiled language could potentially be much faster than a vectorized version where you might be performing needless calculation. However this is assuming that you know how to program code that matches the performance of the Matlab builtins in a language that compiles to native code.

Back on topic ...

The first thing to do is to take the linear difference (linDiff in the code example) between Polygon and your row vector point. To do this in a vectorized manner, the dimensions of the 2 variables must be identical. One way to achieve this is to use repmat to copy each row of point to make it the same size as Polygon. However, bsxfun is usually a superior alternative to repmat (as described in this recent SO question), making the code ...

function [result] = newHitTest (point,Polygon,r,tol,stepSize)
result = 0;
linDiff = bsxfun(@minus, Polygon, point);
testLogicals = sqrt( sum( ( linDiff ).^2 ,2 )) < tol*r;    
if any(testLogicals); result = circleTest (point,Polygon,r,tol,stepSize); end

I rolled your d value into a column of d by summing across the 2nd axis (note the removal of the array index from Polygon and the addition of ,2 in the sum command). I then went further and evaluated the logical array testLogicals inline with the calculation of the distance measure. You will quickly see that a downside of heavy vectorisation is that it can make the code less readable to those not familiar with Matlab, but the performance gains are worth it. Comments are pretty necessary.

Now, if you want to go completely crazy, you could argue that the test function is so simple now that it warrants use of an 'anonymous function' or 'lambda' rather than a complete function definition. The test for whether or not it is worth doing the circleTest does not require the stepSize argument either, which is another reason for perhaps using an anonymous function. You can roll your test into an anonymous function and then jut use circleTest in your calling script, making the code self documenting to some extent . . .

doCircleTest = @(point,Polygon,r,tol) any(sqrt( sum( bsxfun(@minus, Polygon, point).^2, 2 )) < tol*r);

if doCircleTest(point,Polygon,r,tol)
    result = circleTest (point,Polygon,r,tol,stepSize); 
else
    result = 0;
end

Now everything is vectorised, the use of function handles gives me another idea . . .

If you plan on performing this at multiple points in the code, the repetition of the if statements would get a bit ugly. To stay dry, it seems sensible to put the test with the conditional function into a single function, just as you did in your original post. However, the utility of that function would be very narrow - it would only test if the circleTest function should be executed, and then execute it if needs be.

Now imagine that after a while, you have some other conditional functions, just like circleTest, with their own equivalent of doCircleTest. It would be nice to reuse the conditional switching code maybe. For this, make a function like your original that takes a default value, the boolean result of the computationally cheap test function, and the function handle of the expensive conditional function with its associated arguments ...

function result = conditionalFun( default, cheapFunResult, expensiveFun, varargin )
if cheapFunResult
    result = expensiveFun(varargin{:});
else
    result = default;
end
end %//of function

You could call this function from your main script with the following . . .

result = conditionalFun(0, doCircleTest(point,Polygon,r,tol), @circleTest, point,Polygon,r,tol,stepSize);

...and the beauty of it is you can use any test, default value, and expensive function. Perhaps a little overkill for this simple example, but it is where my mind wandered when I brought up the idea of using function handles.

Thanks for the explanation. It really helped. But one more ques..i'm getting the following wrror when i run the vectorized code. `Error using - Matrix dimensions must agree.` We are subtracting a scaler from a matrix, the error should not come. — Vikram, Oct 18 '12 at 09:18
On what line? Assuming that point, r, and tol are scalars and Polygon is 2 dimensional data, then my code does exactly what yours does I think. — learnvst, Oct 18 '12 at 09:23
What are the dimensions of Polygon and point? Type `size(Polygon)` and `size(point)`. The only thing I can think is that `point` might be a row vector, in which case I can make a simple tweak to the code. — learnvst, Oct 18 '12 at 09:27
@user1734167 try the latest versions, you'll probably get more speed — learnvst, Oct 18 '12 at 14:27
@learnvst, try not to edit your answer that much, it became community WIKI :/ Now you don't get all the votes — Andrey Rubshtein, Oct 18 '12 at 14:40
@Andrey just learnt that lesson the hard way! Didn't realise that happened. I got a bit over excited with this one and ket having new ideas :/ — learnvst, Oct 18 '12 at 14:51
@learnvst Don't worry. Flag it and in the description field write to the moderator asking to remove the wiki tag from your post, because you did not know about it and used the 'Edit' button one time too many. If you are nice enough he will do it. — angainor, Oct 18 '12 at 15:02
@angainor Good call. I pretty sure I'm done with the edits on this one now. I went a little crazy and learnt my lesson :S — learnvst, Oct 18 '12 at 15:22
@learnvst I've been there ;) worked for me, so you should definitely try. Although they say on meta that it is not automatic and at the moderator's decision.. — angainor, Oct 18 '12 at 15:23
The automatic wiki switch is simply there to keep users from gaining rep through useless bumps (cc @Andrey). In this case, your edits were legitimate and meant to improve your answer, so I've disabled the automatic wiki as it's not needed. On top of that, good answer! :) — BoltClock, Oct 18 '12 at 16:03
@learnvst: As u said..its even faster..love the bsx function! — Vikram, Oct 19 '12 at 08:40

Vectorization of matlab code

1 Answers1