2

I was wondering if my line of thinking is correct.

I'm preparing for interviews (as a college student) and one of the questions I came across was to find the K largest numbers in an array.

My first thought was to just use a partial selection sort (e.g. scan the array from the first element and keep two variables for the lowest element seen and its index and swap with that index at the end of the array and continue doing so until we've swapped K elements and return a copy of the first K elements in that array). However, this takes O(K*n) time. If I simply sorted the array using an efficient sorting method like Mergesort, it would only take O(n*log(n)) time to sort the entire array and return the K largest numbers.

Is it good enough to discuss these two methods during an interview (comparing log(n) and K of the input and going with the smaller of the two to compute the K largest) or would it be safe to assume that I'm expected to give a O(n) solution for this problem?

Christian Gollhardt
  • 16,510
  • 17
  • 74
  • 111

4 Answers4

3

There exists an O(n) algorithm for finding the k'th smallest element, and once you have that element, you can simply scan through the list and collect the appropriate elements. It's based on Quicksort, but the reasoning behind why it works are rather hairy... There's also a simpler variation that probably will run in O(n). My answer to another question contains a brief discussion of this.

Community
  • 1
  • 1
Aasmund Eldhuset
  • 37,289
  • 4
  • 68
  • 81
0

Here's a general discussion of this particular interview question found from googling:

http://www.geeksforgeeks.org/k-largestor-smallest-elements-in-an-array/

As for your question about interviews in general, it probably greatly depends on the interviewer. They usually like to see how you think about things. So, as long as you can come up with some sort of initial solution, your interviewer would likely ask questions from there depending on what they were looking for exactly.

pjshap
  • 72
  • 11
0

IMHO, I think the interviewer wouldn't be satisfied with either of the methods if he says the dataset is huge (say a billion elements). In this case, if K to be returned is huge (nearing a billion) your partial selection would almost result in an O(n^2). I think it entirely depends on the intricacies of the question proposed.

EDIT: Aasmund Eldhuset's answer shows you how to achieve the O(n) time complexity.

noMAD
  • 7,744
  • 19
  • 56
  • 94
  • 1
    Yes, you can. See Aasmund's answer. – Stefano Sanfilippo Nov 17 '14 at 21:07
  • @StefanoSanfilippo that would give us the `k`th largest/smallest number but not `k` largest/smallest number`s` – noMAD Nov 17 '14 at 21:09
  • 1
    @noMAD: Once you have that element, you simply scan through the list and gather the elements that are less than or equal to the `k`th smallest element. You also need to handle the fact that there may be multiple elements that are equally small, but you can still do it in linear time. – Aasmund Eldhuset Nov 17 '14 at 21:10
0

If you want to find K (so for K = 5 you'll get five results - five highest numbers ) then the best what you can get is O(n+klogn) - you can build prority queue in O(n) and then invoke pq.Dequeue() k times. If you are looking for K biggest number then you can get it with O(n) quicksort modification - it's called k-th order statistics. Pseudocode looks like that: (it's randomized algorithm, avg time is approximately O(n) however worst case is O(n^2))

QuickSortSelection(numbers, currentLength, k) {
    if (currentLength == 1)
      return numbers[0];
    int pivot = random number from numbers array;

    int newPivotIndex = partitionAroundPivot(numbers) // check quicksort algorithm for more details - less elements go left to the pivot, bigger elements go right

    if ( k == newPivotIndex )
        return pivot;
    else if ( k < newPivotIndex )
        return QuickSortSelection(numbers[0..newPivotIndex-1], newPivotIndex, k)
    else
       return QuickSortSelection(numbers[newPivotIndex+1..end], currentLength-newPivotIndex+1, k-newPivotIndex);
}

As i said this algorithm is O(n^2) worst case because pivot is chosen at random (however probability of running time of ~n^2 is something like 1/2^n). You can convert it deterministic algorithm with same running time worst case using for instance median of three median as a pivot - but it is slower in practice (due to constant).

fex
  • 3,488
  • 5
  • 30
  • 46