2

There is already an answer for two sorted arrays. However, in my question one of the arrays is unsorted.

Suppose X[1..n] and Y[1..m] where n < m. X is sorted and Y is unsorted. What is the efficient algorithm to find the kth smallest number of X U Y.

MinHeap can be used to find the kth smallest number in an unsorted array. However, here one of these arrays is sorted. I can think of:

 1. Building a `MinHeap` for `Y`
 2. i = 1, j = 1
 3. x1 = extract Min from Y
 4. x2 = X[i]; 
 5. if j == k: return min(x1, x2)
 5. if x1 < x2: j++; goto 3
 6. else: j++; i++; goto 4 

Is it efficient and correct?

Ahmad
  • 8,811
  • 11
  • 76
  • 141
  • 2
    While building min heap for 'Y', you already cover O(m log m) complexity. So, why not sorting 'Y' and applying same method as applied for both sorted arrays!! – BishalG Dec 05 '18 at 18:47
  • I think building a heap from Y is exactly like sorting Y. What is the order of building the heap? what is the order of extracting Min from Y? – Mahmoud Hanafy Dec 05 '18 at 18:52
  • 3
    @MahmoudHanafy Building a heap with `m` things is `O(m)`. Extracting the smallest thing from it is `O(log(m))`. Extracting it all in sorted order is `O(m log(m))`. You can do better on this problem. – btilly Dec 05 '18 at 18:53
  • @BishalGautam Building a heap from `Y` is O(m), not O(m log m). See for example https://stackoverflow.com/questions/1787252/is-there-a-on-algorithm-to-build-a-max-heap – Jim Mischel Dec 06 '18 at 04:34

2 Answers2

1

There is no help for it but you have to scan Y. This takes O(m) so you can't do better than O(m).

However quickselect has average performance O(m). Basically that algorithm is just to do a quicksort, except that you ignore all partitions that don't have your final answer in them.

Given that n < m we can simply join one array to the other and do quickselect.

Note that the average performance is good, but the worst case performance is quadratic. To fix that, if you do not make progress sufficiently quickly you can switch over to the same median of medians algorithm that gives quicksort guaranteed performance (albeit with bad constants). If you're not familiar with it, that's the one where you divide the array into groups of 5, find the median of each group, and then repeat until you are down to 1 element. Then use that element as the pivot for the whole array.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • Thanks, What is the best in theory (with a less worst case) – Ahmad Dec 05 '18 at 19:06
  • @Ahmad If you do a median of medians run every, say, 5 times, then your average case isn't hurt much and your worst case is still `O(m)`. So I would suggest doing that. – btilly Dec 05 '18 at 19:28
1

Make a max-heap of the smallest k items from the sorted array (X). That takes O(k) time.

For each item in the unsorted array (Y), if it's smaller than the largest item in the heap (the root), then remove the root from the heap and add the new item. Worst case for this is O(m log k).

When you're done, the kth smallest number will be at the top of the heap.

Whereas the worst case for the second part is O(m log k), average case is much better because typically a small percentage of items have to be inserted in the heap.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351