4

Here is the passage from the Elements of Programming Interviews book:

Let P be a set of n points in the plane. Each point has integer coordinates. Design an efficient algorithm for computing a line that contains the maximum number of points in P.

In the solution part the following is said:

 Every pair of distinct points defines a line. We can use a hash table
H to map lines to the set of points in P that lie on them.

And here is the hash function of the Line:

// Hash function for Line.
struct HashLine {
   size_t operator()(const Line& l) const {
       return hash <int >()(l.slope.first) ^ hash <int >()(l.slope.second) ^ hash <int >()(l.intercept.first) ^ hash <int >()(l.intercept.second);
}

And here is the declaration of slope and intercept:

 pair <int, int> get_canonical_fractional(int a, int b) {
    int gcd = GCD(abs(a), abs(b));
    a /= gcd, b /= gcd;
    return b < 0 ? make_pair(-a, -b) : make_pair(a, b);
 }

     // Line function of two points , a and b, and the equation is
     // y = x(b.y - a.y) / (b.x - a.x) + (b.x * a.y - a.x * b.y) / (b.x - a.x).
     struct Line {
     Line(const Point& a, const Point& b)
     : slope(a.x != b.x ? get_canonical_fractional(b.y - a.y, b.x - a.x) : make_pair(1, 0))
     , intercept(a.x != b.x ? get_canonical_fractional(b.x * a.y - a.x * b.y,  b.x - a.x) : make_pair(a.x, 1))
     {}

       ...
     // Store the numerator and denominator pair of slope unless the line is
     // parallel to y-axis that we store 1/0.  
     pair <int, int> slope;
     // Store the numerator and denominator pair of the y-intercept unless
     // the line is parallel to y-axis that we store the x-intercept.     
     pair <int, int> intercept;
};

But how do we know that if slope and intercept pair are unique, then their xor is still unique?

Vardan Hovhannisyan
  • 1,101
  • 3
  • 17
  • 40
  • What exactly is meant by _line_? Would it be feasible to generate the convex hull polygon of the input, which would contain all of the points? Or does the question demand for an affine linear function in which the number of pairs from the input which which form a valid argument-value pair is maximized? – Codor Feb 20 '17 at 07:31
  • 1
    similar to this [Given n points on a 2D plane, find the maximum number of points that lie on the same straight line](http://stackoverflow.com/a/20888844/2521214) – Spektre Feb 20 '17 at 07:34
  • 1
    Hush function? Is it the one that tells your object to shut up? I know nothing about it, but a hash function need not produce unique values. It just needs to produce different values for different inputs *with acceptable probability*. – n. m. could be an AI Feb 20 '17 at 08:26

1 Answers1

1

We can try the following simple algorithm:

  1. Create a hash map to with key as a (slope, intercept) pair for a line and value as the number of lines having the same slope intercept.
  2. For all pairs of points (O(n^2) pairs) compute the (slope, intercept) value and increment the corresponding key in the hashmap (in the worst case, it will consume O(n^2) memory but if there are many co-linear points then the average space complexity should be low).
  3. Output the line i.e., the (slope, intercept) that has the highest count in the hashmap (for this you need to traverse the hasmap which in the worst case will contain O(n^2) entries).
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • Thanks for the response. Yes, in the solution part they use the same algorithm as you have described. But my question is about hash function that is used to map "Line"s to the "Set". – Vardan Hovhannisyan Feb 20 '17 at 08:36
  • As we know the equation of a straight line is `y=mx+c` where `m` is the slope and `c` is the intercept. So, any line can be uniquely identified by storing the `(m,c)` pair, we must not store a line as List of points, since there can be many different pairs of points lying on the same line, but a line must be stored as `(m,c)` pair. If multiple pairs of points lie on the same line, they will map to the same `(m,c)` value. – Sandipan Dey Feb 20 '17 at 08:37
  • But what is wrong that points which lies on the same line will be on the Set. Our goal is to find the line in which maximum points are lied. – Vardan Hovhannisyan Feb 20 '17 at 08:43
  • Think about the points `p1=(1,1), p2=(2,2), p3=(3,3)`: they lie on the same line `y=x` which can be represented as `(m=1,c=0)` pair. Now from each of the pairs `(p1,p2), (p2,p3), (p1,p3)` you will find the same equation of the line, which means they are co-linear. In the hash-map you must store `(m=1,c=0)` and have the value corresponding to this key as `3`, which means this line has 3 pairs of points lying on it. – Sandipan Dey Feb 20 '17 at 08:57
  • To define a straight line uniquely we need 2 points, not one. Since our goal is to find a line and not a point, we should has the lines not the points. – Sandipan Dey Feb 20 '17 at 09:18
  • I understand this, but how do we construct hash function of lines, such that hash(l1) == hash(l2) only and only if l1 == l2? – Vardan Hovhannisyan Feb 20 '17 at 10:58
  • that's what i am emphasizing, use `(slope, intercept)` or `(m,c)` pair as the key to the hash function, so that `l1 = l2 <=> (m1,c1) = (m2,c2) <=> h(m1,c1) = h(m2,c2)`. – Sandipan Dey Feb 20 '17 at 11:01
  • But how we construct the hash function? – Vardan Hovhannisyan Feb 20 '17 at 11:10
  • Create a hashmap as `std::unordered_map, int>`, check this out for more info on `hashmaps`. – Sandipan Dey Feb 20 '17 at 11:17