Choose the closest k points from given n points

Question

You are given a set U of n points on the plane and you can compute distance between any pair of points in constant time. Choose a subset of U called C such that C has exactly k points in it and the distance between the farthest 2 points in C is as small as possible for given k. 1 < k <= n

What's the fastest way to do this besides the obvious n-choose-k solution?

I have a feeling this problem is NP-complete, a sketch of a reduction from a known NP-complete problem would suffice too :) — pathikrit, Mar 30 '11 at 22:29
Done - accepted an answer (although its only a link to a journal paper) — pathikrit, Apr 12 '11 at 16:32

dcn · Accepted Answer · 2011-04-01T18:39:52.987

A solution is shown in Finding k points with minimum diameter and related problems - Aggarwal, 1991. The algorithm described therein has running time: O(k^2.5 n log k + n log n)

For those who have no access to the paper: the problem is called k-diameter and definied as

Find a set of k points with minimum diameter. The diameter of a set is the maximum distance between any points of the set.

I cannot really give an overview over the presented algorithm, but it includes computing the (3k - 3)th order Voronoi diagram of the points, and then solve the problem for each of the O(kn) Voronoi sets (by computing maximum independent sets in some bipartite graphs)... I guess that I am trying to say is, that it is highly non-trivial, and far beyond both an interview and this site :-)

This is good if by "diameter", Aggarwal means the largest distance between the k points. If Aggarwal means the diameter of the smallest enclosing circle, then it's a different problem. I'm not going to pay $31 to find out :) Maybe someone could post a simplified poly-time version of his solution? — Tom Sirgedas, Apr 01 '11 at 17:47

BlueRaja - Danny Pflughoeft · Answer 2 · 2011-03-31T16:53:36.667

9

Since this is an interview question, here is my shot at a solution. (As dcn points out below, this is not guaranteed to return the optimal solution, though it should still be a decent heuristic. Good catch, dcn!)

Create a set S_p with a single point P.
Compute the distance between every point in S_p and every point outside of it, then add the point with the smallest max distance to S_p.
Repeat 2. until S_p has k points.
Repeat 1-3 using each point once as the initial P. Take the S_p which has the smallest max distance.

There are O(k) points in S_p, and O(n) points outside of it, so finding the point with the smallest max distance is O(nk). We repeat this k times, then repeat the whole procedure n times, for an overall complexity of O(n²k²).

We can improve on this by caching the max distance between any point in S_p and each point outside of S_p. If maxDistanceFromPointInS[pointOutsideS] is, say, an O(1) hash-table containing the current max distance between every point pointOutsideS and some point inside S_p, then every time we add a new point newPoint, we set maxDistanceFromPointInS[p] = Max(maxDistanceFromPointInS[p], distance(newPoint, p)) for all points p outside of S_p. Then finding the smallest max distance is O(n), adding a point to S_p is O(n). Repeating this k times gives us O((n+n)k) = O(nk). Finally, we repeat the whole procedure n times, for an overall complexity of O(n²k).

We could improve finding the smallest max distance to O(1) using a heap, but that would not change the overall complexity.

By the way, it took an hour to write this all up - there's no way I could have done this in an interview.

edited Mar 31 '11 at 16:53

answered Mar 30 '11 at 22:09

BlueRaja - Danny Pflughoeft

84,206
33
197
283

Thank you for this pointer - this was not an interview question I have asked but was thinking about if I should ask or not to atleast see how they think. – pathikrit Mar 30 '11 at 22:28
3

@wrick: How would *you* answer this question? I don't think it's fair to ask a question you yourself cannot answer.. – BlueRaja - Danny Pflughoeft Mar 30 '11 at 22:38
@BlueRaja - that's why I asked it here hoping there might be a easy solution I was not thinking of. Now I have changed my mind and decided not to ask this :) – pathikrit Mar 31 '11 at 01:17
Your described solution implements some heuristic but does not give an optimal answer. Also I think the reference you gave is wrong, since it is about a different problem: smallest-k-enclosing circle it not the same as "distance between farthest points ... [minimal]" (consider the case for a equilateral triangle!) – dcn Mar 31 '11 at 10:07
@dcn: Do you have a counter-example? Of course, [proof by lack of counter-example](http://math.stackexchange.com/questions/514/conjectures-that-have-been-disproved-with-extremely-large-counterexamples/527#527) is not a proof, but I'm still not convinced it's *not* correct. Also, the example of an equilateral triangle would be solved correctly by smallest k-enclosing circle.. – BlueRaja - Danny Pflughoeft Mar 31 '11 at 16:08
1

Why smallest k-enclosing circle is different: consider a equilateral triangle with side length a. The smallest circle containing all 3 endpoints has radius r=3a/sqrt(a) > a/2. Now take three other points to be collinear with distance r-epsilon. They fit in a circle of radius r-epsilon (which is smaller than the circle form the triangle), however the max distance in the two cluster of three points is 2*(r-epsilon) vs a (for the triangle case) – dcn Mar 31 '11 at 16:35
1

Why your solution does not work: take a equilateral triangle with side length a. Now add three more points to the plain: one for each corner of the triangle, facing away from the triangle, and with distance b=a-epsilon to the triangle corner (similar to a star) . Your algorithm will for every initial P always *take an edge of length b* in the first iteration of Step 2. However the correct solution for k=3 is the middle triangle! – dcn Mar 31 '11 at 16:43
@dcn Damn, you are right on both counts. I'll edit my answer - good catch! – BlueRaja - Danny Pflughoeft Mar 31 '11 at 16:54

Null Set · Answer 3 · 2011-03-30T22:41:11.890

This is still a messy idea, ~~I am not sure if it actually works~~ It does not work. Leaving the wrong answer here for posterity.

For each point in U
    make a list of the distance to each point in U
    sort the list
    add largest distance to a max-heap.
while any of the lists have more than k elements
    remove max of heap twice
    remove corresponding elements from the two lists they came from
    add the two newly exposed largest elements from those two lists to the heap
Any of the lists left with k elements will list the elements in C

Basically, find the two points who currently look like they could both be in the subset together, and which have the largest pairwise distance, and then rule out their both being in the subset together. Repeat until you are left with only one possible way to form a k-sized subset.

This should be time complexity O((n^2)log(n)) and space complexity O(n^2).

Greedy algorithm won't work as you might end up selecting C={x,y1,y2} where x-y1 and x-y2 distances are small but y1 and y2 are far away from each other — pathikrit, Mar 30 '11 at 22:26

score 0 · Answer 4 · answered Mar 31 '11 at 20:27

0

Its doable in deterministic polynomial time apparently.

But I don't understand their algorithm. It seems the circles they choose are always one of the input points. Can someone shed some light on this or explain neatly what they do (just the section 2 would suffice)?

answered Mar 31 '11 at 20:27

pathikrit

32,469
37
142
221

Look at my comment for the answer of @BlueRaja - Danny Pflughoeft. The paper is about a different problem! – dcn Mar 31 '11 at 20:29
So the question "Find the smallest circle (radius and center) that encloses atleast k points from the given n points" and the question "Find the subset of k points from given n points such that maximum distance between 2 points in the chosen subset is minimzed" are different questions? – pathikrit Mar 31 '11 at 20:33
1

Yes, as shown by my counterexample. However, I found "your" problem referenced in the paper you linked above. Take a look at my answer. – dcn Mar 31 '11 at 20:36

jonderry · Answer 5 · 2011-04-01T21:11:36.813

The actual problem that was asked seems difficult. If the points were not in the plane and had arbitrary distances, it would be NP-hard by reduction from k-clique (take an arbitrary unweighted graph, and add edges of length infinity for missing edges, and edges of length 1 for existing edges -- a clique of size k exists whenever the 'closest k points' have a maximum mutual distance of 1 rather than infinity). However, since the points are constrained to being in the plane so that such distance labelings are forbidden, it's possible that there may be a solution. At least, it seems amenable to close approximation.

If your interview meant 'smallest diameter circle enclosing k points', then the following might be the fastest correct algorithm you could reasonably come up with by yourself in an interview:

For each set of size 3, compute the smallest circle that contains these points, and check whether each point is contained in this circle. If the number contained is at least k, update the minimum diameter accordingly.

Essentially, this is a quadruply nested for loop, so the running time is O(n^4).

This will give a good approximation, but it's not exactly right. For k=3, let's say you have three points each 1 unit apart from each other (diameter = 1.154...). Someplace else there are 2 points 1.1 units apart, with a 3rd point in between (diameter = 1.1). Your algorithm will incorrectly choose the latter group of three points. — Tom Sirgedas, Apr 01 '11 at 17:38
According to my understanding, 1.1 is the correct return value for the this problem instance, so this is not a counterexample. *Edit: Oh ok, I see what you mean. — jonderry, Apr 01 '11 at 18:37

score -2 · Answer 6 · answered Mar 30 '11 at 22:04

-2

This should be a good approximate solution if you don't have too many outliers

p = mean center (average x, y) of U
M = U sorted by distance to p
C = M[0:k]

answered Mar 30 '11 at 22:04

limscoder

3,037
2
23
37

Approximate solutions are never good enough. This one will fail for sure. – Snowbear Mar 30 '11 at 22:07
4

@Snowbear: I'd hate to be the person who had to teach you about NP-complete problems... – BlueRaja - Danny Pflughoeft Mar 30 '11 at 22:11
Approximate solutions are tricky traps as you have to prove their error bounds and run-times. You can ofcourse do other optimizations in real-life like find clusters etc – pathikrit Mar 30 '11 at 22:31

Choose the closest k points from given n points

6 Answers6

Linked