0

I have a matrix X of dimension r = 2 rows and col = 20000 columns and I want to compute the square root of the sum of squared distances = Euclidean distance between pair of points. For ex:

Let,

X =      1  2 3 4
         5 6 7 8



  Dist1 = sqrt((1-2)^2 + (5-6)^2))
   Dist2 = sqrt((1-3)^2 + (5-7)^2))
and so on. So, distance(1,2) = Dist1;
distance(1,3) = Dist2

The result will be a matrix of size N*N. But, it is taking a lot of time when the data points are large say 1 million. How can I effectively modify this code so that it is decent and fast. Please help.

 r =2;
    col = 2000;

    X = rand(r,col);  
   N = col;
        for k =1: N                
                for l = 1: N
                    if (l ~= k)
                       distance(k,l) =( sqrt(sum((X(:,k) - X(:,l)) .^ 2)));
                    end
                end
                end
    end
SKM
  • 959
  • 2
  • 19
  • 45
  • 2
    Look [here](http://stackoverflow.com/questions/23911670/efficiently-compute-pairwise-squared-euclidean-distance-in-matlab/23911671#23911671) – Benoit_11 Jan 30 '15 at 02:55
  • [Here's](http://stackoverflow.com/a/26994722/3293881) the explanation and some vectorized variations to it. – Divakar Jan 30 '15 at 05:20
  • 1
    X = [ 1 2 3 4; 5 6 7 8]; [i,j] = meshgrid(1:size(X,2),1:size(X,2)); Dx = reshape(X(1,i)-X(1,j),size(X,2),size(X,2)); Dy = reshape(X(2,i)-X(2,j),size(X,2),size(X,2)); D=sqrt(Dx.^2 + Dy.^2); – BerndGit Jan 30 '15 at 11:35
  • Couldn't add it as an separate answer, since it is marked as duplicate. However my solution is more specific then the answer refered by rayryeng, since the dimension is just 2 and no option for higher dimensions is required. – BerndGit Jan 30 '15 at 11:39

1 Answers1

0

Because you have the special case of computing the distance between each point in a set of points, you have one advantage that you can use so that you can improve speed:

The distance between the points A and B is dist(A,B), but dist(A,B) === dist(B,A), for Euclidean distance.

So you only need to calculate half of the matrix you are calculating. I leave only the upper triangular half. You can also use the fact that dist(A,A) = 0 to save some time.

The following code calculated your matrix %66.55 faster than the method in your question. Note that only the upper triangular half is used so you will have to index dist(a,b) using only the valid elements (i.e. a <= b).

r = 2;
N = 2000;
X = rand(r,N);
t = tic();
dist = zeros(N,N);
for k = 1:N
    for l = (k+1):N
        dist(k,l) = sqrt(sum((X(:,l)-X(:,k)).^2));
    end
end
T = toc(t);
disp(T);

Also see Benoit_11's link which is interesting.

Community
  • 1
  • 1
Gouda
  • 1,005
  • 1
  • 10
  • 19
  • Thank you for your reply, will this method work for any dimension of the data set? Say if dimension d = 100, then it will mean 100 for loops. I was looking for a method where so many loops can be avoided. Can you please shed some light on this matter? – SKM Feb 02 '15 at 20:51