4

I am looking at the following interview question :

Given 2d coordinates , find the k points which are closest to the origin. Propose a data structure for storing the points and the method to get the k points. Also point out the complexity of the code.

The solution that I have figured out is to save the 2d points in an array. For the first k points, find the distance of each point from the origin and build a max heap. For the remaining points , calculate distance from the origin , say dist. If dist is greater than the topmost element of the heap, then change topmost element of heap to dist and run the heapify() procedure.

This would take O(k) to build the heap and O((n-k)log k) for heapify() procedure , thus the total complexity = O(n log k).

Can anyone suggest a better data structure and/or method , with a possibly better efficiency too ?

EDIT

Would some other data structure be beneficial here ?

Rndm
  • 6,710
  • 7
  • 39
  • 58
  • 2
    You can actually do this in time O(n) and space O(k), where k is the number of closest points that you want, by using a very clever trick: http://stackoverflow.com/questions/9202315/algorithm-to-find-100-closest-stars-to-the-origin – Håvard Geithus Jul 28 '12 at 16:34
  • 1
    I'm wandering how can you design the data structure to return the points, that is return maybe set of pair, but not array or distances. – DoReMi Oct 30 '13 at 23:04

3 Answers3

3

What you're looking for is partial sorting.

I think the best way is to put everything into an unsorted array and then do use a modified in-place quicksort which ignores partitions whose indices are entirely above or entirely below k, and use distance from origin as your comparison.

Pseudocode from the wikipedia article above:

function quickfindFirstK(list, left, right, k)
     if right > left
         select pivotIndex between left and right
         pivotNewIndex := partition(list, left, right, pivotIndex)
         if pivotNewIndex > left + k  // new condition
             quickfindFirstK(list, left, pivotNewIndex-1, k)
         if pivotNewIndex < left + k
             quickfindFirstK(list, pivotNewIndex+1, right, k+left-pivotNewIndex-1)

After execution, this will leave the smallest k items in the first k positions, but not in order.

Don Roby
  • 40,677
  • 6
  • 91
  • 113
1

I would use order statistics for this one.
Note that we use a modified SELECT that uses distance from the origin as the comparison function.

  • Store the elements in an array A, first element is A[1] and last is A[n].
  • Run SELECT(A,1,n,k) to find the kth closest element to the origin.
  • Return the elements A[1..k].

One of the benefits of SELECTthat it partitions the input, so that the smallest k-1 elements are left to A[k].

So storing the elements in an array is O(n).
Running SELECT is O(n).
Returning the requested elements is O(1).

Avi Cohen
  • 3,102
  • 2
  • 25
  • 26
  • 1
    (1) `SELECT` takes O(n^2) time in worst case. (2) Step 3 above is not needed as I require all k points which are closest to origin but the algorithm would not need to be modified. – Rndm Jul 28 '12 at 16:32
  • 1
    I noticed that you asked a question from CLRS. you can see there that there is a none randomized version of SELECT that runs in O(n) worst case. It uses median of medians. – Avi Cohen Jul 28 '12 at 16:35
1

I write a simple version for you using so-called 'partial sorting' http://tzutalin.blogspot.sg/2017/02/interview-type-questions-minqueue.html

public static void main(String[] args) {
    Point[] points = new Point[7];
    points[0] = new Point(0, 0);
    points[1] = new Point(1, 7);
    points[2] = new Point(2, 2);
    points[3] = new Point(2, 2);
    points[4] = new Point(3, 2);
    points[5] = new Point(1, 4);
    points[6] = new Point(1, 1);
    int k = 3;
    qSelect(points, k - 1);
    for (int i = 0; i < k; i++) {
        System.out.println("" + points[i].x + "," + points[i].y);
    }
    // Output will be
    //        0,0
    //        1,1
    //        2,2
}

// in-place qselect and zero-based
static void qSelect(Point[] points, int k) {
    int l = 0;
    int h = points.length - 1;
    while (l <= h) {
        int partionInd = partition(l, h, points);
        if (partionInd == k) {
            return;
        } else if (partionInd < k) {
            l = partionInd + 1;
        } else {
            h = partionInd - 1;
        }
    }
}

static int partition(int l, int h, Point[] points) {
    // Random can be better
    // int p = l + new Random.nextInt(h - l + 1);
    int p = l + (h - l) / 2;
    int ind = l;
    swap(p, h, points);
    Point comparePoint = points[h];
    for (int i = l; i < h; i++) {
        if (points[i].getDistFromCenter() < comparePoint.getDistFromCenter()) {
            swap(i, ind, points);
            ind++;
        }
    }
    swap(ind, h, points);
    return ind;
}

static void swap(int i, int j, Point[] points) {
    Point temp = points[i];
    points[i] = points[j];
    points[j] = temp;
}
tzatalin
  • 404
  • 4
  • 7