19

I had an interview with Facebook and they asked me this question.

Suppose you have an unordered array with N distinct values

$input = [3,6,2,8,9,4,5]

Implement a function that finds the Kth largest value.

EG: If K = 0, return 9. If K = 1, return 8.

What I did was this method.

private static int getMax(Integer[] input, int k)
{
    List<Integer> list = Arrays.asList(input);
    Set<Integer> set = new TreeSet<Integer>(list);

    list = new ArrayList<Integer>(set);
    int value = (list.size() - 1) - k;

    return list.get(value);
}

I just tested and the method works fine based on the question. However, interviewee said, in order to make your life complex! lets assume that your array contains millions of numbers then your listing becomes too slow. What you do in this case? As hint, he suggested to use min heap. Based on my knowledge each child value of heap should not be more than root value. So, in this case if we assume that 3 is root then 6 is its child and its value is grater than root's value. I'm probably wrong but what you think and what is its implementation based on min heap?

Hesam
  • 52,260
  • 74
  • 224
  • 365
  • why didn't you ask for a code sample from the interviewer? – Olimpiu POP Sep 01 '15 at 05:28
  • In a min heap, every node is less than or equal to both its children. Hence the root would be `2` rather than `3`. One possible layout would be the tree `2 -> [3,4], 3 -> [5,6], 4 -> [8,9]`. – paxdiablo Sep 01 '15 at 05:29
  • 1
    Related - [How to find the kth largest element in an unsorted array of length n in O(n)?](http://stackoverflow.com/questions/251781/how-to-find-the-kth-largest-element-in-an-unsorted-array-of-length-n-in-on) – Bernhard Barker Sep 01 '15 at 12:10
  • Why convert to a TreeSet and back, rather than just calling Collections.sort? – user253751 Sep 01 '15 at 12:51
  • @immibis oh, yup, I wasn't familiar with that :( I wanted to delete all duplicates from the list and sort it in ascending order. I'm not sure `Collections.sort` removes duplicates! – Hesam Sep 01 '15 at 14:10
  • You could directly add it into a TreeSet it doesn't allow duplicates also maintains sorted order. – Uma Kanth Sep 01 '15 at 17:52
  • Check this answer for complete explanation: http://stackoverflow.com/questions/251781/how-to-find-the-kth-largest-element-in-an-unsorted-array-of-length-n-in-on/32394237#32394237 – akhil_mittal Sep 04 '15 at 09:10

5 Answers5

19

He has actually given you the whole answer. Not just a hint.

And your understanding is based on max heap. Not min heap. And it's workings are self-explanatory.

In a min heap, the root has the minimum (less than it's children) value.

So, what you need is, iterate over the array and populate K elements in min heap. Once, it's done, the heap automatically contains the lowest at the root.

Now, for each (next) element you read from the array, -> check if the value is greater than root of min heap. -> If yes, remove root from min heap, and add the value to it.

After you traverse your whole array, the root of min heap will automtically contain the kth largest element.

And all other elements (k-1 elements to be precise) in the heap will be larger than k.

Codebender
  • 14,221
  • 7
  • 48
  • 85
  • Thanks man, I found where is my problem. Will try to find the right way for implementation. – Hesam Sep 01 '15 at 05:36
  • @Codebender It would be great if you could explain a bit of the example given in the question. I want to learn it. Thanks – Uma Kanth Sep 01 '15 at 06:17
  • @UmaKanth, do you want an example of the heap implementation (google has so many examples of it) or explain the other parts of the algorithm a bit more? – Codebender Sep 01 '15 at 07:14
  • @UmaKanth I have provide the implementation of the problem. Do check in the answers. – YoungHobbit Sep 01 '15 at 08:16
5

Here is the implementation of the Min Heap using PriorityQueue in java. Complexity: n * log k.

import java.util.PriorityQueue;

public class LargestK {

  private static Integer largestK(Integer array[], int k) {
    PriorityQueue<Integer> queue = new PriorityQueue<Integer>(k+1);
    int i = 0;
    while (i<=k) {
      queue.add(array[i]);
      i++;
    }
    for (; i<array.length; i++) {
      Integer value = queue.peek();
      if (array[i] > value) {
        queue.poll();
        queue.add(array[i]);
      }
    }
    return queue.peek();
  }

  public static void main(String[] args) {
    Integer array[] = new Integer[] {3,6,2,8,9,4,5};
    System.out.println(largestK(array, 3));
  }
}

Output: 5

The code loop over the array which is O(n). Size of the PriorityQueue (Min Heap) is k, so any operation would be log k. In the worst case scenario, in which all the number are sorted ASC, complexity is n*log k, because for each element you need to remove top of the heap and insert new element.

YoungHobbit
  • 13,254
  • 9
  • 50
  • 73
  • @njzk2 The complexity is `n*log k`, not `n*log n`. The code loop over the array which is `O(n)`. Size of the `PriorityQueue` (Min Heap) is k, so any operation would be `log k`. In the worst case scenario, in which all the number are sorted ASC, complexity is `n*log k`. because for each element you need to remove top of the heap and insert new element. – YoungHobbit Sep 01 '15 at 13:41
  • 1
    right, I didn't consider that the `queue.poll();` made sure the queue was always of size `k`. – njzk2 Sep 01 '15 at 13:58
3

Edit: Check this answer for O(n) solution.

You can probably make use of PriorityQueue as well to solve this problem:

public int findKthLargest(int[] nums, int k) {
        int p = 0;
        int numElements = nums.length;
        // create priority queue where all the elements of nums will be stored
        PriorityQueue<Integer> pq = new PriorityQueue<Integer>();

        // place all the elements of the array to this priority queue
        for (int n : nums){
            pq.add(n);
        }

        // extract the kth largest element
        while (numElements-k+1 > 0){
            p = pq.poll();
            k++;
        }

        return p;
    }

From the Java doc:

Implementation note: this implementation provides O(log(n)) time for the enqueing and dequeing methods (offer, poll, remove() and add); linear time for the remove(Object) and contains(Object) methods; and constant time for the retrieval methods (peek, element, and size).

The for loop runs n times and the complexity of the above algorithm is O(nlogn).

Community
  • 1
  • 1
akhil_mittal
  • 23,309
  • 7
  • 96
  • 95
  • Thanks @akhil, but I think it still has O(n) issue if you assume `num` array contains a million items. your `For` and `While` loop are O(n) each. Or all my knowledge is messed up :( – Hesam Sep 01 '15 at 05:29
  • Also, in this your space complexity is unnecessarily high because your copying the whole array to the queue. – Codebender Sep 01 '15 at 05:32
  • One insight I would be looking for: since I'm looking for the K'th-largest value, if I already have K elements in my heap/priority queue, and a number A comes in that's smaller than the smallest of those -- then that A will definitely not be one of the largest K, and thus should not even be put into the data structure. In other words, the data structure should never contain more than K elements. That way, even if you have tens of millions of values -- billions, even! -- and they're streaming in (because they don't all fit in memory), you can still solve the problem if K is small enough. – yshavit Sep 01 '15 at 05:36
  • And the time comp is actually O(nlog(n)), it's really difficult (if not impossible) to achieve O(n) in this problem. – Codebender Sep 01 '15 at 05:39
  • 1
    Your `while` loop is O(n) and each `poll()` inside it is O(log(n)). So the total comp is `O(nlog(n))` – Codebender Sep 01 '15 at 05:42
  • @Codebender no, because the while is `k`, not `n`. the `add` loop, however, is `n*logn`, because `add` is `logn` too, as indicated in the documentation quoted. – njzk2 Sep 01 '15 at 12:17
  • @njzk2 Codebender means the `for` loop, not the `while` loop. – user253751 Sep 01 '15 at 12:52
  • 1
    @immibis It seems quite clear to me that the `while` loop was meant, as evidenced by the reference to the `poll`s which are indeed in the while loop – njzk2 Sep 01 '15 at 12:59
  • @njzk2 Well nonetheless... The `for` loop is O(n) and each `add()` inside it is O(log(n)). So the total comp is O(nlog(n)). – user253751 Sep 01 '15 at 13:01
  • @njzk2, his while loop has `n-k-1` iterations if I am not wrong which is still O(n) if I am not wrong again. But what you said is true, his while loop actually needs only `k` iterations. – Codebender Sep 01 '15 at 13:14
  • @Codebender, sorry, i didn't realize the queue was polled from the wrong end, for some reason. n operations indeed, but that can be fixed. the for loop, however, is still `n*logn`, because `add` is `logn` – njzk2 Sep 01 '15 at 13:36
  • @immibis: exactly, `the add loop (...) is n*logn` – njzk2 Sep 01 '15 at 13:37
0

Heap based solution is perfect if the number of elements in array/stream is unknown. But, what if they are finite but still you want an optimized solution in linear time.

We can use Quick Select, discussed here.

Array = [3,6,2,8,9,4,5]

Let's chose the pivot as first element:

pivot = 3 (at 0th index),

Now partition the array in such a way that all elements less than or equal are on left side and numbers greater than 3 on right side. Like it's done in Quick Sort (discussed on my blog).

So after first pass - [2,3,6,8,9,4,5]

pivot index is 1 (i.e it's the second lowest element). Now apply the same process again.

chose, 6 now, the value at index after previous pivot - [2,3,4,5,6,8,9]

So now 6 is at the proper place.

Keep checking if you have found the appropriate number (kth largest or kth lowest in each iteration). If it's found you are done else continue.

rai.skumar
  • 10,309
  • 6
  • 39
  • 55
0

One approach for constant values of k is to use a partial insertion sort.

(This assumes distinct values, but can easily be altered to work with duplicates as well)

last_min = -inf
output = []
for i in (0..k)
    min = +inf
    for value in input_array
        if value < min and value > last_min
            min = value
    output[i] = min
print output[k-1]

(That's pseudo code, but should be easy enough to implement in Java).

The overall complexity is O(n*k), which means it works pretty well if and only if k is constant or known to be less that log(n).

On the plus side, it is a really simple solution. On the minus side, it is not as efficient as the heap solution

njzk2
  • 38,969
  • 7
  • 69
  • 107