3

I found a Google Interview question on CareerCup

Given a 2D plane, suppose that there are around 6000 points on it. Find a line which passes the most number of points.

Many answers there say this question is hard and involving some kind of special algorithms.

But my view is different and maybe I am wrong.

Here is my thinking:

First I will give a axis system to the 2D plane. Hence, every point will have its unique x and y, i.e., {x, y}. For simplicity, we can put the axis system's {0, 0} as the left bottom of the whole plane and therefore every x and y are bigger than 0.

Then I have a theory:

If several points are on the same line, then it must be in one of the following 3 cases:

  1. their x values are the same
  2. their y values are the same
  3. their x/y or y/x values are the same. But x/y case is the same as y/x case, so let's just focus on x/y.

Then I will have 3 hashtables.

  1. The first one (hashtable-x) is with key of x, the value is the list of points which have the same x;

  2. the second one (hashtable-y) is with the key of y and the value is the list of points which have the same y;

  3. the last one (hashtable-x-y) is with the key of x/y and the value is the list of points which have the same x/y;

Then I scan the whole 6000 points, for each point, I will get its x from hashtable-x, and put the point into the value (a list) of that slot; then I will do similar things to hashtable-y and hashtable-x-y.

Finally, i will scan all lists in the 3 hashtables and find the longest list will contains the points of the desired line.

How do you think of my algorithm?


Ok, here is the duplicate, sorry that I didn't find that question before.

What is the most efficient algorithm to find a straight line that goes through most points?

Community
  • 1
  • 1
Jackson Tale
  • 25,428
  • 34
  • 149
  • 271

2 Answers2

2

Your algorithm won't work as stated. Consider many points fall on the line y=2x + 1, meaning that you get (1,3),(2,5), (3,7), (4,9), and (5,11).

I don't think you're expected to solve this unless you have a graduate level course in computational geometry. The deal is to convert all the points to lines in the dual space, using point-line duality and find the point at which most lines intersect. Naively you can do this in O(n^2) by going through every pair of lines and evaluating where they intersect in analytic form. I also think you can do O(n log n) by using plane sweep style algorithms but I'm not sure of the details.

carlosdc
  • 12,022
  • 4
  • 45
  • 62
  • thanks. What do you mean by `convert all the points to lines in the dual space`? How to do it exactly? – Jackson Tale Jun 07 '12 at 15:05
  • For every point (x,y) and you say y = ax + b. And solve b = -xa + y, and that is a line in dual A-B space. You do that for every point and intersect lines. – carlosdc Jun 07 '12 at 15:12
  • do you mean that a line is in the form of `y = ax+b` provided `a` and `b` are known? How about @tskuzzy's answer? – Jackson Tale Jun 07 '12 at 15:45
  • No, b = -xa + y where x and y are known, this is a line in the a-b axis. His solution is a O(n^3) solution, impractical for 6000 points. – carlosdc Jun 07 '12 at 15:52
  • @carlosdc: nowadays I don't think an O(N^3) solution should be considered in principle impractical for 6000 elements. Looping 2e11 times may just take a couple of minutes. – salva Jun 07 '12 at 16:49
0

I'm going to assume that the number of points is greater than or equal to 2 here in my answer (zero and one point are trivial cases).

First notice that any such line must pass through at least two of the points. So we can construct a solution as follows:

for each pair of points p1,p2
    find equation of the line l passing through p1,p2

    for each point p3 not p1 or p2
        if p3 lies on l
            counter[l]++

return argmax(counter)
tskuzzy
  • 35,812
  • 14
  • 73
  • 140