1

I am looking for an efficient way to calculate the pairwise distances between all points in a coordinate matrix of size (nodeCount x 2) using MATLAB. I do not desire to calculate the pairwise distance twice (for example, between nodes 1-2 and between nodes 2-1). I have constructed an outer 'for' loop that increments through each node with an inner loop that evaluates only nodes of a higher index number. The result is an upper triangular matrix populated by the nodal separation distances. I would like to vectorize these computations, or at a minimum increase the efficiency of this operation. Any help would be appreciated.

gap = 10;

for s = 1:(nodeCount); 
  for ss = s+1:(nodeCount);
    if abs(nodeCoord(s,1)-nodeCoord(ss,1)) < gap;
      sep(s,ss) = sqrt((nodeCoord(s,1)-nodeCoord(ss,1))^2+(nodeCoord(s,2)-nodeCoord(ss,2))^2);
    end
  end
end
fuesika
  • 3,280
  • 6
  • 26
  • 34
Matt M
  • 11
  • 2
  • Would love to see how the solutions posted here work for you in terms of efficiency! – Divakar Aug 28 '14 at 14:36
  • Hi all, thank you for the great comments! I have amended my original code to display a trick that I am using to speed up the computations somewhat for my application. I am most interested in computing the distance between nodal pairs that are within a certain proximity of each other defined by the variable "gap." I have an additional screen in my code that filters out nodal coordinates that are farther than the gap in the 'x' direction, so I further cut down my computation time and memory requirements. I am dealing with 10's of thousands of nodes, and this code is the bottleneck. – Matt M Aug 28 '14 at 14:48
  • Are you pre-allocating `sep` with zeros? I am asking because if the condition - `if abs(nodeCoord(s,1)-nodeCoord(ss,1)) < gap` is not fulfilled, what must be `sep(s,ss)` then? – Divakar Aug 28 '14 at 16:08
  • You are correct. The matrix 'sep' is pre-allocated with zeros. – Matt M Aug 28 '14 at 17:36
  • @MattM: Introducing the `gap` you fundamentally changed the problem. Efficient implementations are based on kD Trees. All solutions discussed here basically have O(n^2) runtime, with kd-trees and a constant number of neibours within the gap you can reach O(nlogn) – Daniel Aug 28 '14 at 17:55
  • @MattM Could you test out the solutions posted here? – Divakar Aug 28 '14 at 18:11
  • Ok I see. There is actually one thing you may be able to do to improve the efficiency further. Are the nodes sorted? Since you do only care about the first coordinate you could in that case break the loop at the first coordinate that fail to meet the condition. This is of course impossible if the node locations are random. If this is only a few coordinates you could as well use a sparse matrix to save memory. – patrik Aug 28 '14 at 21:28
  • Also I actually think that it may not be the calculation time which is the problem. Sure, an O(n^2) recquires a lot of calculations. However it is no match for a computer. If that is the bottle neck in your code you should congratulate yourself for writing really efficient matlab code. However, the memory allocation is indeed a problem. A full 50k by 50k matrix of doubles will indeed take 18gb of memory. Can it be so that the higher index values does not meet your criteria? Or that you already uses a `sparse` matrix. This would be valuable information to include – patrik Aug 28 '14 at 21:44

4 Answers4

1

The loop is not really dependent that perspect. I guess you want to find the distance to all other coordinates try this:

xCoord = [1;2;3;4;5];
yCoord = [1;2;3;4;5]:
xSquare = bsxfun(@(x,y) power((x-y),2),xCoord,xCoord.');
ySquare = bsxfun(@(x,y) power((x-y),2),yCoord,yCoord.');
dist = sqrt(xSquare+ySquare);
patrik
  • 4,506
  • 6
  • 24
  • 48
  • Thanks for the response. This code seems to be slower than the code I have constructed, and I ran out of memory (I am running on a laptop and I have approximately 50,000 nodal pairs). – Matt M Aug 28 '14 at 14:40
1

Rather than trying to use the fact that the lower triangular elements are not needed, as they are zeros in the output, I think you are better off using a technique that's based on fast matrix multiplication as discussed in this very smart solution to get the full matrix of euclidean distances. To get the desired output of upper triangular matrix, you can wrap the output with triu.

The code that follows next is a slightly modified version of it on the terms that we are calculating the distances among the same pair of coordinates from nodeCoord.

Code

numA = size(nodeCoord,1);
helpA = ones(numA,6);
helpB = ones(numA,6);
for idx = 1:2
    sqA_idx = nodeCoord(:,idx).^2;
    helpA(:,3*idx-1:3*idx) = [-2*nodeCoord(:,idx), sqA_idx ];
    helpB(:,3*idx-2:3*idx-1) = [sqA_idx , nodeCoord(:,idx)];
end
sep = triu(sqrt(helpA(:,1:3) * helpB(:,1:3)')<gap).* sqrt(helpA * helpB');
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
1
xCoord = [1;2;3;4;5];
yCoord = [1;2;3;4;5];

dist = sqrt(pdist2(xCoord,yCoord,'euclidean'));

Can use the function pdist2

lakshmen
  • 28,346
  • 66
  • 178
  • 276
  • Thanks! pdist2 is actually slower than my code, presumably because distances are computed between each pair twice. – Matt M Aug 28 '14 at 14:36
0

pdist(nodeCoord) does it in a fast way, but returning the data in a vector. Mapping it back to a matrix costs about the same time as computing the distances:

sep3=zeros(nodeCount,nodeCount);
sep3(tril(true(nodeCount),-1))=pdist(nodeCoord);
sep3=sep3+sep3.';

If you are happy with a lower triangular matrix, you can leave out the last line.

Daniel
  • 36,610
  • 3
  • 36
  • 69