Given a set of data points, find the one that is "closest"

Question

So we have a reference set of 3D points (let's call it R), and many other sets of 3D points (let's call that set of sets of data points P, and each data set in that Pi).

The task is to return the Pi that minimizes the euclidean distance the data points in some Pi and R. The way I see it is this:

For each point in Pi, compare to each point in R and find the minimum difference between two points.
Sum up these minimum distances to reach the minimum total "difference" between Pi and R.
The answer Pi is the one with the smallest difference.

But this is pretty crazy, since it means essentially looking at the distance between every point in R and every point in P, which could be thousands or millions. Surely I can do better than that.

I am working in Matlab, which I am not used to.

What is a better algorithm to use? Is there a data structure that is perfect for this? (for example a K-D tree?)

score 1 · Accepted Answer · answered May 08 '12 at 16:46

Unless you have such a crazy amount of points that this really becomes a performance issue, comparing to every point is really the easiest solution, particularly because these operations can be highly vectorized in Matlab.

For example:

R = [1 2 3; 1 3 4];
P{1} = [2 3 5;1 1 2;2 1 3];
P{2} = [4 4 4];

nP = length(P);
sumMinDist = zeros(nP,1);

%# make R into n-by-1-by-3 already
Rperm = permute(R,[1 3 2]);

for iP = 1:nP

%# since we want to sum up the minima, we need to take the square root
allDist = sqrt( sum( bsxfun(@minus, Rperm, permute(P{iP},[3 1 2])).^2, 3));

%# sum the minima (you may want to consider
%# taking the mean instead!)
sumMinDist(iP) = sum(min(allDist,[],1));

end

%# now we can identify the closest set
[~,idxOfClosestSet] = min(sumMinDist);

Given a set of data points, find the one that is "closest"

1 Answers1