4

Using MATLAB, Imagine a Nx6 array of numbers which represent N segments with 3+3=6 initial and end point coordinates.

Assume I have a function Calc_Dist( Segment_1, Segment_2 ) that takes as input two 1x6 arrays, and that after some operations returns a scalar, namely the minimal euclidean distance between these two segments.

I want to calculate the pairwise minimal distance between all N segments of my list, but would like to avoid a double loop to do so.

I cannot wrap my head around the documentation of the bsxfun function of MATLAB, so I cannot make this work. For the sake of a minimal example (the distance calculation is obviously not correct):

function scalar = calc_dist( segment_1, segment_2 )
         scalar = sum( segment_1 + segment_2 )
end

and the main

Segments = rand( 1500, 6 )
Pairwise_Distance_Matrix = bsxfun( @calc_dist, segments, segments' )

Is there any way to do this, or am I forced to use double loops ?

Thank you for any suggestion

user3666197
  • 1
  • 6
  • 50
  • 92
Mathusalem
  • 837
  • 9
  • 21
  • bsxfun takes as argument a function `C = fun(A,B)` where `A`, `B`, `C` should have the same size. Is not the case for your function. –  Oct 06 '14 at 14:55
  • what if my function returns a 1 by 6 copy of the scalar, so that A,B and C have the same size ? – Mathusalem Oct 06 '14 at 14:56
  • +1 for minimal example – Luis Mendo Oct 06 '14 at 15:05
  • @Mathusalem If you read carefully the requirements for `fun` argument of `bsxfun`, the function needs to accept arguments of *arbitrary* size, which is not your case again (because your args need to be 1x6). –  Oct 06 '14 at 15:13

2 Answers2

3

Unfortunately I don't see any "smarter" (i.e. read faster) solution than the double loop. For speed consideration I'd organize the points as a 6×N array, not the other way, because column access is way faster than row access in MATLAB.

So:

N = 150000;
Segments = rand(6, N);
Pairwise_Distance_Matrix = Inf(N, N);
for i = 1:(N-1)
        for j = (i+1):N
                Pairwise_Distance_Matrix(i,j) = calc_dist(Segments(:,i), Segments(:,j));
        end;
end;

Minimum_Pairwise_Distance = min(min(Pairwise_Distance_Matrix));

Contrary to common wisdom, explicit loops are faster now in MATLAB compared to the likes of arrayfun, cellfun or structfun; bsxfun beats everything else in terms of speed, but it doesn't apply to your case.

3

I think you need pdist rather than bsxfun. pdist can be used in two different ways, the second of which is applicable to your problem:

  • With built-in distance functions, supplied as strings, such as 'euclidean', 'hamming' etc.

  • With a custom distance function, a handle to which you supply.

In the second case, the distance function

must be of the form

     function D2 = distfun(XI, XJ),

taking as arguments a 1-by-N vector XI containing a single row of X, an M2-by-N matrix XJ containing multiple rows of X, and returning an M2-by-1 vector of distances D2, whose Jth element is the distance between the observations XI and XJ(J,:).

Although the documentation doesn't tell, it's very likely that the second way is not as efficient as the first (a double loop might even be faster, who knows), but you can use it. You would need to define your function so that it fulfills the stated condition. With your example function it's easy: for this part you'd use bsxfun:

function scalar = calc_dist( segment_1, segment_2 )
    scalar = sum(bsxfun(@plus, segment_1, segment_2), 2);
end

Note also that

  • pdist works with rows (not columns), which is what you need.
  • pdist reduces operations by exploiting the properties that any distance function must have. Namely, the distance of an element to itself is known to be zero; and the distance for each pair can be computed just once thanks to symmetry. If you want to arrange the output in the form of a matrix, use squareform.

So, after your actual distance function has been modified appropriately (which may be the hard part), use:

distances = squareform(pdist(segments, @calc_dist));

For example:

N = 4;
segments = rand(N,6);
distances = squareform(pdist(segments, @calc_dist));

produces

distances =
         0    6.1492    7.0886    5.5016
    6.1492         0    6.8559    5.2688
    7.0886    6.8559         0    6.2082
    5.5016    5.2688    6.2082         0
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • The funny part is that, internally, `pdist` uses double loops to compute the pairwise distance. :-) Just `edit pdist` and look at the code around from line 250 onwards (I have access to Revision: 1.1.8.5) –  Oct 06 '14 at 15:37
  • 1
    @CST-Link It also says "Call a mex file to compute distances for the standard distance measures" (revision 1.1.8.2). That more or less confirms that it's faster with built-in distance functions – Luis Mendo Oct 06 '14 at 15:41
  • I skipped those lines... indeed, for the standard distances and non-sparse input, it will use the MEX function. –  Oct 06 '14 at 15:45
  • Thank you both Luis Mendo and CST-Link. In my mind you both provided an answer in which I learned something. I selected the answer at random :/ – Mathusalem Oct 07 '14 at 08:05
  • Your method accelerated my execution time by a factor 1.5 – Mathusalem Oct 07 '14 at 10:07
  • @Mathusalem: See [this question and my answer](http://stackoverflow.com/questions/17777292/fast-algorithms-for-finding-pairwise-euclidean-distance) for further tips on speeding up `pdist` for calculating pairwise Euclidean distance. – horchler Nov 22 '14 at 00:45