Given two (large) sets of points, how can I efficiently find pairs that are nearest to each other?

Question

I need to solve a computational problem that boils down to searching for reciprocally-nearest pairs of points between two sets. The problem goes something like this:

Given a set of points A and a set of points B in euclidean space, find all pairs (a,b) such that b is the closest point in B to a and a is the closest point in A to b.

The sets A and B are of approximately equal size, and we will call this size N. For my particular problem N is approximately 250,000.

The brute force solution is to compare every point against every other point, which has quadratic time complexity. Is there any more efficient algorithm for doing this?

It really depends on how your data is held/organised. I would suggest looking at this wiki page http://en.wikipedia.org/wiki/Kd-tree for some ideas. This could give you a more effective search, but could also require reformatting your data points. — TZHX, Feb 22 '11 at 10:34
Being a purely technical question, this belongs to StackOverflow. — Péter Török, Feb 22 '11 at 10:36
I second @Peter this is a specific problem and as such should be in SO — Gaurav, Feb 22 '11 at 10:50

score 13 · Accepted Answer · edited Mar 20 '23 at 19:06

A data structure I found very useful when I had to do nearest neighbour searches was a k-d-tree. Wikipedia has a nice overview and this is an excellent in-depth discussion of the algorithm if you're implementing your own (although a library may well exist already - you don't mention which language you're using). The most important thing about a kd-tree is that it allows nearest-neghbour searches to be performed in O(log N) time.

In that way, you could produce two lists of - the members of A and their nearest neighbour in B and the members of B and their nearest neighbour in A - in O(N log N) time. Then, you could compare the lists to see which pairs match. Done naively, that's O(N^2), though you might be able to think of a way to do it faster.

[edit] You've got me thinking; here's my second thought:

for(a in A)
    b := nearest(B, a)
    if a = nearest(A, b)
        add (a, b) to results
    end if
end for

function nearest(X, y)
    return nearest member of set X to point y
end function

By my reckoning, that's O(N log N).

I'm using R, and R doesn indeed have a package for nearest-neighbor finding via kd-trees. It ever has a package for computing nearest neighbors between two sets of points. http://cran.r-project.org/web/packages/RANN/RANN.pdf — Ryan C. Thompson, Feb 23 '11 at 02:31
Combining the two lists of nearest neighbours can be done in O(N) time using a list merge: just swap the order of the elements in each pair for (say) B's nearest neighbours, sort them by their first element, and walk down from the start of both lists, finding the elements that are present in both and writing them to the output list. — j_random_hacker, Mar 19 '13 at 13:32

score 3 · Answer 2 · answered Aug 13 '12 at 16:15

Sorry for picking up a rather old thread but I just wanted to add a solution I've found in my textbook for an Algorithm Design class:

There is a divide-and-conquer (think merge-sort) approach to solve this problem that should be O(n logn), I've only seen it for finding the shortest distance within one set of points but it should be easily adapted to require each pairing to consist of points from different sets.

Sort all points according to X-value.
Split the whole set in two equal parts.
Recurse on each half and pick the minimal distance of the two (d)
Find the right-most point (p) in the left half and check the distance for all points between p_x and p_x + d, if any of these distances are shorter than d that is the d to return, otherwise return d.

Coming back to this answer, how to make it work for e.g. 3D cases? — Jean-Yves, Oct 27 '15 at 09:03
I think if you keep switching the axes between iteration (sort and divide by X, then by Y, ... then again by X,...), you will end up with a classic BSP tree. — 9000, Mar 20 '23 at 19:13

score -1 · Answer 3 · answered Jan 06 '16 at 03:47

Old thread, but I see there is a pretty recent comment.

I believe for an n dimensional set of points the near point between two sets can be found by finding the near point to the origin of the set difference. You can seek out the paper by Phillip Wolfe of Bell Labs where he lays out the algorithm. You can think of it by taking a random point in set A, finding the closest point in set B, then finding the closest point to the point in set B and so on. http://link.springer.com/article/10.1007%2FBF01580381

score -1 · Answer 4 · edited May 23 '17 at 12:17

-1

BSP / Octree / Kd-tree (thanks, TZHX)
Space-filling curves (e.g. using Hilbert curve); google yields many relevant results for pair finding.

edited May 23 '17 at 12:17

Community

1
1

answered Feb 22 '11 at 10:50

9000

39,899
9
66
104

I think there are some good pointers in this answer towards potential solutions. The fact that it is not a fully-formulated answer may reflect the true nature of this problem—I don't know if there is a simple answer and I'm not sure the other answers posted so far are good. – Bill Mar 18 '23 at 18:13
@Bill I think the accepted answer about _k_-d trees is good. This approach is well-understood and widely used, so finding reference implementation or help is easy. Space-filling curves are more complicated; I've added them as a starting point for research if space partition solutions don't look good for some reason. – 9000 Mar 20 '23 at 19:11
Apologies, I misunderstood the question. I thought it was to find a pair-match between *all* points that minimizes the average distance (not just the sub-set of pairs that are mutually nearest to each other). That is a different and more difficult problem. – Bill Mar 22 '23 at 16:53

Given two (large) sets of points, how can I efficiently find pairs that are nearest to each other?

4 Answers4

Linked