4

I have many points and I want to build distance matrix i.e. distance of every point with all of other points but I want to don't use from loop because take too time... Is a better way for building this matrix? this is my loop: for a setl with size: 10000x3 this method take a lot of my time :(

 for i=1:size(setl,1)
     for j=1:size(setl,1)        
         dist = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+...
             (zl(i)-zl(j))^2);
         distanceMatrix(i,j) = dist;
     end
 end
Royi
  • 4,640
  • 6
  • 46
  • 64
ahmad hosseini
  • 101
  • 2
  • 11
  • 2
    Do you have the stats toolbox? If so then: http://www.mathworks.com/help/stats/pdist.html – Dan Oct 14 '13 at 12:17
  • 1
    You only need to calculate the half of the matrix, because the distances between two points are symmetric (e.g. d(x,y)=d(y,x) ). Otherwise they are calculated twice. – Jost Oct 14 '13 at 12:19
  • 1
    No, if you use `pdist` the distances are only calculated once. Then one can use `squareform` to build the symmetric distance matrix. See [my answer here](http://stackoverflow.com/questions/17932170/vectorizing-double-for-loop-in-matlab). You can also type `edit squareform` to see the code used (no `for` loop). – horchler Oct 14 '13 at 13:07
  • 1
    If you care about speed, `pdist` (a native C function) and `squareform` are the only way to go, unless you want to [try compiling mex code](http://stackoverflow.com/questions/17777292/fast-algorithms-for-finding-pairwise-euclidean-distance/17799550#17799550) for a bit more speed. – horchler Oct 14 '13 at 13:37

5 Answers5

19

How about using some linear algebra? The distance of two points can be computed from the inner product of their position vectors,

D(x, y) = ∥y – x∥ = √ ( xT x + yT y – 2 xT y ),

and the inner product for all pairs of points can be obtained through a simple matrix operation.

x = [xl(:)'; yl(:)'; zl(:)'];
IP = x' * x;
d = sqrt(bsxfun(@plus, diag(IP), diag(IP)') - 2 * IP);

For 10000 points, I get the following timing results:

  • ahmad's loop + shoelzer's preallocation: 7.8 seconds
  • Dan's vectorized indices: 5.3 seconds
  • Mohsen's bsxfun: 1.5 seconds
  • my solution: 1.3 seconds
A. Donda
  • 8,381
  • 2
  • 20
  • 49
  • Timing on my machine, i.e. an Intel Core i7-3770 CPU @ 3.40GHz (four cores + hyperthreading), 64-bit Matlab running under Debian Linux. – A. Donda Oct 14 '13 at 13:46
  • 1
    I win. ;-) Plus, it's really elegant. – A. Donda Oct 14 '13 at 13:48
  • 1
    Just noticed your post. +1 I have always liked this method best because it is fast and a smart formulation of the problem. In fact, in [a different answer](http://stackoverflow.com/a/19456458/2778484), I compared a few methods, one of which was equivalent to this one. It was the best! ;) There is a little extra mathematical description in that answer, but I reached the same conclusion. The other question, on the other hand, was a mess. – chappjc Nov 07 '13 at 01:11
  • @chappjc: Thanks! I'll put in an explaining formula based on the one in your answer. Pity there's no LaTeX math here... – A. Donda Nov 07 '13 at 18:10
  • Sorry, for my late comment. Very useful post but what is the `xl`, `yl` and `zl` vars? Thank you. – Thoth Feb 03 '14 at 22:44
  • @thoth, I used the variables from the question: x, y, & z coordinates of the points – A. Donda Feb 04 '14 at 01:31
3

You can use bsxfun which is generally a faster solution:

s = [xl(:) yl(:) zl(:)];
d = sqrt(sum(bsxfun(@minus, permute(s, [1 3 2]), permute(s, [3 1 2])).^2,3));
Mohsen Nosratinia
  • 9,844
  • 1
  • 27
  • 52
2

You can do this fully vectorized like so:

n = numel(xl);
[X, Y] = meshgrid(1:n,1:n);
Ix = X(:)
Iy = Y(:)
reshape(sqrt((xl(Ix)-xl(Iy)).^2+(yl(Ix)-yl(Iy)).^2+(zl(Ix)-zl(Iy)).^2), n, n);

If you look at Ix and Iy (try it for like a 3x3 dataset), they make every combination of linear indexes possible for each of your matrices. Now you can just do each subtraction in one shot!

However mixing the suggestions of shoelzer and Jost will give you an almost identical performance performance boost:

n = 50;

xl = rand(n,1);
yl = rand(n,1);
zl = rand(n,1);

tic
for t = 1:100
    distanceMatrix = zeros(n); %// Preallocation
    for i=1:n
       for j=min(i+1,n):n %// Taking advantge of symmetry
           distanceMatrix(i,j) = sqrt((xl(i)-xl(j))^2+(yl(i)-yl(j))^2+(zl(i)-zl(j))^2);
       end
    end
    d1 = distanceMatrix + distanceMatrix';           %'
end
toc


%// Vectorized solution that creates linear indices using meshgrid
tic
for t = 1:100
    [X, Y] = meshgrid(1:n,1:n);
    Ix = X(:);
    Iy = Y(:);
    d2 = reshape(sqrt((xl(Ix)-xl(Iy)).^2+(yl(Ix)-yl(Iy)).^2+(zl(Ix)-zl(Iy)).^2), n, n);
end
toc

Returns:

Elapsed time is 0.023332 seconds.
Elapsed time is 0.024454 seconds.

But if I change n to 500 then I get

Elapsed time is 1.227956 seconds.
Elapsed time is 2.030925 seconds.

Which just goes to show that you should always bench mark solutions in Matlab before writing off loops as slow! In this case, depending on the scale of your solution, loops could be significantly faster.

Mohsen Nosratinia
  • 9,844
  • 1
  • 27
  • 52
Dan
  • 45,079
  • 17
  • 88
  • 157
  • 3
    Very nice answer, but I question your timing results. When I run this code on my machine, both techniques give similar times of about 0.02 seconds. – shoelzer Oct 14 '13 at 13:10
  • @shoelzer I ran it online, maybe there was no JIT. I'll check in real Matlab and update – Dan Oct 14 '13 at 13:18
  • @shoelzer I ran it in Matlab and you're right, they give very similar results! Updated. – Dan Oct 14 '13 at 13:21
  • Thanks for the update! I did not expect results to change like that. "Always benchmark" is indeed good advice. – shoelzer Oct 14 '13 at 13:30
  • `coords = [x,y,z]; d2 = sqrt(sum( bsxfun(@minus,permute(coords,[1,3,2]),permute(coords,[3,1,2])).^2, 3));` may be more efficient. Even better to create two arrays instead of `permute`ing a single one. – Jonas Oct 14 '13 at 13:37
  • Also: note that you probably wanted to write `x1=rand(n,1)` etc. – Jonas Oct 14 '13 at 13:38
1

Be sure to preallocate distanceMatrix. Your loops will run much, much faster and vectorization probably isn't needed. Even if you do it, there may not be any further speed increase.

shoelzer
  • 10,648
  • 2
  • 28
  • 49
  • I didn't know that there is preallocate thing(!!!) but I can't use this item, and I am very interesting for using it... I'm sure that I can give my answer using this property... Is there anyone that tell me: how can I preallocate distanceMatrix? – ahmad hosseini Oct 14 '13 at 12:49
  • 1
    @ahmadhosseini `distanceMatrix = zeros(size(setl,1),size(setl,1))` before the two `for` loops to pre-allocate the matrix. See http://www.mathworks.co.uk/support/solutions/en/data/1-18150/ – am304 Oct 14 '13 at 12:51
0

The latest versions (Since R2016b) of MATLAB support Implicit Broadcasting (See also noted on bsxfun()).

Hence the fastest way for distance matrix is:

function [ mDistMat ] = CalcDistanceMatrix( mA, mB )

mDistMat = sum(mA .^ 2).' - (2 * mA.' * mB) + sum(mB .^ 2);


end

Where the points are along the columns of the set.
In your case mA = mB.

Have a look on my Calculate Distance Matrix Project.

Royi
  • 4,640
  • 6
  • 46
  • 64