46

Let's imagine we have a plane with some points on it. We also have a circle of given radius.

I need an algorithm that determines such position of the circle that it covers maximal possible number of points. Of course, there are many such positions, so the algorithm should return one of them.
Precision is not important and the algorithm may do small mistakes.

Here is an example picture:
Example

Input:
  int n (n<=50) – number of points;
  float x[n] and float y[n] – arrays with points' X and Y coordinates;
  float r – radius of the circle.

Output:
  float cx and float cy – coordinates of the circle's center

Lightning speed of the algorithm is not required, but it shouldn't be too slow (because I know a few slow solutions for this situation).

C++ code is preferred, but not obligatory.

Oleh Prypin
  • 33,184
  • 10
  • 89
  • 99

10 Answers10

25

Edited to better wording, as suggested :

Basic observations :

  • I assume the radius is one, since it doesn't change anything.
  • given any two points, there exists at most two unit circles on which they lie.
  • given a solution circle to your problem, you can move it until it contains two points of your set while keeping the same number of points of your set inside it.

The algorithm is then:

  • For each pair of points, if their distance is < 2, compute the two unit circles C1 and C2 that pass through them.
  • Compute the number of points of your set inside C1 and C2
  • Take the max.
Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
  • I like this solution. It may be the one I need... But I'm not sure it's always correct – Oleh Prypin Jul 12 '10 at 15:17
  • Can you explain the rationale Alex? – Mau Jul 12 '10 at 15:18
  • It is O(n^3) in the number of points. You can check whether C is inside the circle of radius 1 passing through A and B (with one prescribed orientation) by calculating one determinant I think. But please beware of roundoff errors, they can really spoil this kind of algorithms. – Alexandre C. Jul 12 '10 at 15:19
  • I didn't notice your answer :(, wrote absolutelly the same with bit explanations. – Max Jul 12 '10 at 15:19
  • @Mau, @BlaXpirit: You can move the disk a little until it touches two points without changing the number of points inside it. – Alexandre C. Jul 12 '10 at 15:19
  • +1 It is obviously true solution. `O(n^3)` is not good but it is mathematically true solution. – Alexey Malistov Jul 12 '10 at 15:27
  • @Alexey: it's not really n^3 since a lot of pairs are discarded as long as their distance is > 2, so it's rather O(n^2) on average. – Alexandre C. Jul 12 '10 at 15:40
  • Accepted. Reasons: This solution seems to be correct. The other ones are difficult, or I simply don't understand them – Oleh Prypin Jul 12 '10 at 15:43
  • I don't really get it. What does this even mean? `Take two points, you have (at most) two unit circles that pass through it`? What's `it`? Why are you picking unit circles? This should really be explained better. – IVlad Jul 12 '10 at 15:53
  • @IVlad: ok, I reword my quickly written answer. – Alexandre C. Jul 12 '10 at 15:58
  • `given a solution circle to your problem, you can move it until it contains two points of your set while keeping the same number of points of your set inside it.` - I don't think this is true. Anyway, ignoring that, consider a triangle that (barely) fits inside a circle of radius `r`. The distance between each two points of the triangle is `> 2`. Consider two points very very far away from the triangle with mutual distance `< 2`. I think your algorithm would only take those 2 points as solution. Your algorithm is wrong in my opinion, it doesn't even use `r` anywhere. – IVlad Jul 12 '10 at 16:03
  • 3
    @ IVlad: For every two points that are < 2r apart, there are two circles of radius r that could pass through these two points. (If two points are *exactly* 2r apart, there's only one circle.) With regards to the two points, these circles are "optimal": Since the points are on the very border of the circle, the circle covers a maximum of additional space (and, possibly, further points). Iterating over all point pairs gives you all possible circles with >= 2 points in them. Check these circles for the number of points they each contain, and select the one circle with the highest number. – DevSolar Jul 12 '10 at 16:05
  • @DevSolar - and if there are no points with mutual distance `< 2`? – IVlad Jul 12 '10 at 16:08
  • @ IVlad: Assuming you meant <= 2, then each point has a circle of radius r around it that contains only that point. – DevSolar Jul 12 '10 at 16:08
  • @ IVlad: "I don't think this is true." - it is. You move the circle in any direction until the first point is on its line. Then you "swing" the circle around that point until the second point is on its line. You now have a circle that touches two points, without having lost any points. (Rather evident, I think. Unless the circle didn't contain 2+ points to begin with.) – DevSolar Jul 12 '10 at 16:13
  • @IVlad, @DevSolar: a correct proof would involve topology and the "toll-passing lemma". The argument relies on the fact that the function "center of circle -> number of points" is integer-valued. – Alexandre C. Jul 12 '10 at 16:17
  • @DevSolar - I think my confusion stems from the fact that you used `2` in your answer. Shouldn't it be `2*r`? – IVlad Jul 12 '10 at 16:19
  • Umm.. actually it was the only answer I fully understood. – Oleh Prypin Jul 12 '10 at 16:19
  • But aren't there actually some points outside the "unit" circle, which will be covered by a big circle? – mip Jul 12 '10 at 16:26
  • @ doc: The radius of all circles in the solution is identical (r). @ IVlad: Yes, the "2" in the answer should read "2r". – DevSolar Jul 12 '10 at 16:28
  • @DevSolar - well, +1 then. I really think that should be made more clear, you really had me scratching my head when I saw no mention of `r` anywhere. – IVlad Jul 12 '10 at 16:31
  • OK then. It shouldn't be called a unit circle, but rather a circle with radius r. It's a good solution but it is written in the terrible way. "you can move it until it contains two points of your set while keeping the same number of points of your set inside it." - until you'll find two points on its edge, while keeping the same number of points inside it ;/ – mip Jul 12 '10 at 16:43
  • This would make a great LINQ project. – maxwellb Jul 12 '10 at 18:42
  • This could be reduced from O(n^3) to O(n^2 * mlogn) (m is average number of points in each tested circle). If you put all the points in a quad-tree and use that when testing each circle to tally the other points it contains. The big-O math would get messy, but it could be further reduced by using the quad-tree to identify the candidate-pairs of points that are close enough to be worth testing. – Alan Jul 12 '10 at 23:19
  • @Alan: It is already less than O(n^3) since on average only O(n) pairs of points are within 2r (assuming uniform distribution). So you actually get something O(n^2)-ish. – Alexandre C. Jul 13 '10 at 08:25
  • I wonder what happens when you take floating point inaccuracies into account. – starblue Jul 13 '10 at 20:58
  • @starblue: good point. The test of whether a point lies inside a given circle is indeed a tough one numerically. You can use arbitrary precision arithmetic for this. – Alexandre C. Dec 14 '11 at 08:34
  • How do you find `the two unit circles C1 and C2 that pass through them` – wilmol May 17 '20 at 03:50
9

This is the "disk partial covering problem" in the literature -- that should give you a good place to start googling. Here's a paper covering one possible solution, but it is a little intense mathematically: Approximation Algorithms Design for Disk Partial Covering Problem

As a matter of fact, this falls in the area called computational geometry, which is fascinating but can be hard to get a toehold in. There's a good overview by deBerg on various algorithms related to the subject.

kinokijuf
  • 968
  • 1
  • 11
  • 33
Joel
  • 5,618
  • 1
  • 20
  • 19
5

Here are two ideas: an O(n) approximation algorithm and an O(n^2 log n) exact (non-approximate) algorithm:

Fast approximation

Use locality-sensitive hashing. Basically, hash each point to a hash bucket that contains all nearby points. The buckets are set up so that collisions only happen between nearby points - unlike similarly-named hash tables, collisions are useful here. Keep a running count of the number of collisions in a bucket, and then use the center of that bucket as the center of your circle.

I admit that this is a fast explanation of a concept that is not super-obvious the first time you hear of it. An analogy would be to ask a bunch of people what their zip code is, and use the most common zip code to determine the most populous circle. It's not perfect, but a good fast heuristic.

It's basically linear time in terms of the number of points, and you can update your data set on the fly to incrementally get new answers in constant time per point (constant with respect to n = number of points).

More about locality-sensitive hashes in general or about my personal favorite version that would work in this case.

Better-than-brute-force deterministic approach

The idea is: for each point, place the edge of a circle on that point and sweep the circle around to see which direction contains the most points. Do this for all points and we find the global max.

The sweep around a point p can be done in n log n time by: (a) finding the angle interval per other point q such that when we place the circle at angle theta, then it contains q; and (b) sorting the intervals so that we can march/sweep around p in linear time.

So it takes O(n log n) time to find the best circle touching a fixed point p, and then multiply that by O(n) to perform the check for all points. The total time is O(n^2 log n).

Tyler
  • 28,498
  • 11
  • 90
  • 106
  • Don't you have an extra n multiplier to calculate coverage in the second case? You'll still need an extra loop to calculate how many points are inside the circle. – Joric Mar 05 '22 at 21:53
  • 2
    Hi @Joric. Short answer: I think there is indeed an exact answer here in time O(n^2 log n) but wow did I underexplain that (6.5 years ago). Fix the point p and imagine a unit circle sweeping around, tangent to p. Track the circle using angle theta from p to c = the circle's center. For every other point q, we can find the angle interval [theta_1, theta_2] wherein q will be inside the circle (incl border). Now when we sweep the circle around, we can use the pre-computed [theta_1, theta_2] intervals to efficiently update which points are in c. The amortized time per circle update is O(1). – Tyler Mar 07 '22 at 01:07
  • @Tyler, can you please provide any reference to this amortized analysis – codeR May 11 '23 at 10:27
4

If you want something simple, take random position (x,y), calculate number of points inside circle and compare with previous position. Take the maximum. Repeat the operation any times you want.

Why the hell downvote? Ever heard about Monte Carlo methods? Actually for a huge amount of points, deterministic algorithm may not finish in reasonable time.

mip
  • 8,355
  • 6
  • 53
  • 72
  • 3
    @maxwellb: "Precision is not important and algorithm may do small mistakes" – mip Jul 12 '10 at 16:58
  • Reasonable time, also, is expressed in terms of a relationship with the number of points given. For example, O(n^3) means that the algorithm finishes in time proportional to the cube of the number of points. Often, "reasonable" time for an algorithm is: polynomial (n^3, n^8, n^k): OK, linear (n, or 3n, (1/9)n, kn, all are n): great, logarithmic: awesome. – maxwellb Jul 12 '10 at 16:59
  • Fair enough, and I think your point is valid, so I didn't downvote. Cheers. – maxwellb Jul 12 '10 at 17:00
  • For my situation, this solution isn't too bad – Oleh Prypin Jul 12 '10 at 17:20
  • 2
    A good Monte Carlo algorithm can bound the error probability as a function on some parameter (e.g. number of iterations in your case). I don't see any such analysis in your solution. – Eyal Schneider May 29 '14 at 19:20
  • @EyalSchneider I just gave a general idea. It's easy to improve the algorithm, compiling MC approach with previous answers. For example, for the circles I would choose positions, which are not totally random, but those for which there is incidence between circle edge and two points. If you additionally provide a way to use pairs once (for example remember them), then you can estimate that error probability decreases asymptotically to zero for the number of iterations equal to the number of all pairs on the plain for which circle can be constructed. – mip May 30 '14 at 04:24
2

The problem reverts back to finding the global optimum of a function f :R x R -> N. The input for f is the center point of the circle, and the value, of course, is the number of included points from the set. The graph of the function will be discontinuous, staircase-like.

You could start by testing the function in each point corresponding to a point in the set, order the points by decreasing values of f, then intensify search around those points (for example moving out along a spiral).

Another option is to consider all intersection points of segments connecting any pairs of points in the set. Your optimal point will be at one of these intersections I think, but their number is probably too big to consider.

You may also hybridise options 1 and 2, and consider intersections of the segments generated from points around 'good set points'.

Anyhow, unless the number of set points is low (allowing you to check all intersections), I don't think you can guarantee to find the optimum (just a good solution).

Mau
  • 14,234
  • 2
  • 31
  • 52
  • I can see the same answer posted three times in total - probably by mistake. Can you delete the two instances of it? – Suma Jul 12 '10 at 15:11
  • Done. SO technical issues :-) – Mau Jul 12 '10 at 15:13
  • The number of points is low (I've just edited the question). But I don't understand a few things, like... how can you make a graph of 2-parameter function? – Oleh Prypin Jul 12 '10 at 15:14
  • minimization of discrete-valued functions is *very* tough. – Alexandre C. Jul 12 '10 at 15:15
  • Well, you don't have to draw the graph, I was just picturing it in my head... like this anyway: http://soft.proindependent.com/doc/manual-en/pics/3D-function-plot-modified.png – Mau Jul 12 '10 at 15:16
  • @Alexandre. In this case, however the set of points to consider is finite. – Mau Jul 12 '10 at 15:16
  • @Mau: The set of points is *always* finite because computers cannot deal with infinite sets. Still, it's a non-linear discrete optimization problem, which is the toughest class of optimization problems. There are no known good generic algorithms for this class of problems, as opposed to linear or continuous optimization problems. – Philipp Jul 12 '10 at 15:34
  • 1
    @Philipp Well I think really only linear (non-integer) are tractable in the general case. But we digress :-) – Mau Jul 12 '10 at 18:53
1

If it is true that precision is not important and algorithm may do small mistakes then I think the following.

Let f(x,y) is a function which has a maximum at the point (0,0) and is only significant at the points inside of circle of radius R. For example, f(x,y) = e^{(x^2 + y^2)/ (2 * R^2)}.

Let (x_i,y_i) are points and E_i(x,y) = f(x - x_i, y - y_i).

Your problem is to find the maximum of \sum_i E_i(x,y) alt text.

You can use a gradient descent starting it from each point.

Community
  • 1
  • 1
Alexey Malistov
  • 26,407
  • 13
  • 68
  • 88
  • I'm 15-year-old kid. I love programming and I don't like Math. So I often don't understand these smart math things :) – Oleh Prypin Jul 12 '10 at 15:28
  • @BlaXpirit, you can achieve a lot by programming if you don't know math a lot. But to effectively solve--and understand solutions of--tasks like the one you posted you should get better in maths. (By the way, Alexey is also using LaTeX notation of writing mathematical functions, it's not related to maths, but one of those things you'd better learn about) – P Shved Jul 12 '10 at 16:25
  • @Alexey, it would be absolutely correct if `f` was `1 inside the circle, and 0 outside`. :-) – P Shved Jul 12 '10 at 16:27
  • @Pavel, gradient descent does not work correctly for the case of your nondifferentiable function. – Alexey Malistov Jul 12 '10 at 19:01
  • @Alexey, exactly, but otherwise you risk to find a point that isn't a solution. In fact, I'm sure (though I didn't try to prove) that, for each continuous function, it's possible to construct an input for which the algorithm fails as badly as we wish. – P Shved Jul 12 '10 at 20:57
  • 1
    @Pavel, I'm sure too. I started my answer with `If it is true that precision is not important...`. – Alexey Malistov Jul 13 '10 at 06:37
  • @Alexey, ok, might have missed it. Sorry for irrelevant comments. – P Shved Jul 13 '10 at 08:34
1

At a first glance, I would say a quad tree solution.

Also, there is an information visualization/Data mining method called K-means which makes clusters of given data. It can be used with added functionality in the end to fit your purpose.

The basic algorithm for K-Means is:

  1. Place K points into the space represented by the items that are being clustered - These points represent initial group centroids
  2. Assign each data item to the group that has the closest centroid
  3. When all items have been assigned, recalculate the positions of the K centroids by calculating the mean position of your dots
  4. Repeat Steps 2 and 3 until the centroids no longer move, or move very little

The added functionality would then be:

  1. Calculate number of points within your circle positioned at the K centroids
  2. Choose the one that suits you the best ;)

Sources:
K-means algorithm - Linköping University
Quadtree - en.wikipedia.org/wiki/Quadtree

A quick search on wikipedia and you find source code: en.wikipedia.org/wiki/K-means_clustering

BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
user389609
  • 11
  • 1
1

You could pixelize the whole area, then go to each point and increase the value of all pixels within the circle of the radius around that point. The pixels with the highest sum are good candidates.

Of course, you might lose some good areas or "hallucinate" good areas due to rounding errors. Perhaps you could try to do a rough pixellation first, then refine the promising areas.

Svante
  • 50,694
  • 11
  • 78
  • 122
0

Might I suggest a density map? Find the min and max bounds for the x and y. Divide the range of the x and y bounds into bins having widths equal to the diameter of your circle. Count the number of points in each bin for the x and y separately. Now find the intersection on your density map between the highest ranked x bin with the highest ranked y bin.

This is a VERY fast algorithm to quickly generalize large data sets but it is not always accurate, to improve accuracy you can slice the bins into smaller and smaller pieces or shift the bin positions to the left or right n times and use a voting system to select the answer that occurs most often between trials.

Peter Hanneman
  • 523
  • 1
  • 5
  • 18
-4

This is famous K-closest point algorithm. Described here: http://www.cs.ucsb.edu/~suri/cs235/ClosestPair.pdf

Jack
  • 1,398
  • 3
  • 16
  • 26