Finding kth smallest number from n sorted arrays

Question

So, you have n sorted arrays (not necessarily of equal length), and you are to return the kth smallest element in the combined array (i.e the combined array formed by merging all the n sorted arrays)

I have been trying it and its other variants for quite a while now, and till now I only feel comfortable in the case where there are two arrays of equal length, both sorted and one has to return the median of these two. This has logarithmic time complexity.

After this I tried to generalize it to finding kth smallest among two sorted arrays. Here is the question on SO. Even here the solution given is not obvious to me. But even if I somehow manage to convince myself of this solution, I am still curious as to how to solve the absolute general case (which is my question)

Can somebody explain me a step by step solution (which again in my opinion should take logarithmic time i.e O( log(n₁) + log(n₂) ... + log(n_N) where n₁, n₂...n_N are the lengths of the n arrays) which starts from the more specific cases and moves on to the more general one?

I know similar questions for more specific cases are there all over the internet, but I haven't found a convincing and clear answer.

Here is a link to a question (and its answer) on SO which deals with 5 sorted arrays and finding the median of the combined array. The answer just gets too complicated for me to able to generalize it.

Even clean approaches for the more specific cases (as I mentioned during the post) are welcome.

PS: Do you think this can be further generalized to the case of unsorted arrays?

PPS: It's not a homework problem, I am just preparing for interviews.

This question is more suited for StackProgramming, I think. I can't answer your question anyhow :) — Daryl Teo, Jan 06 '12 at 05:19
What do you mean by logarithmic time? We have two parameters, n and k. I don't think you can get faster than O(n) because you will have to look at each array at least once. — David Grayson, Jan 06 '12 at 05:19
logarithmic means something like O( lg(n1) + lg(n2) + lg(n3)...) where n1, n2, n3.. are the lengths of arrays n1, n2, n3...n — Sushant, Jan 06 '12 at 05:29

score 15 · Accepted Answer · answered Jan 10 '12 at 06:50

This doesn't generalize the links, but does solve the problem:

Go through all the arrays and if any have length > k, truncate to length k (this is silly, but we'll mess with k later, so do it anyway)
Identify the largest remaining array A. If more than one, pick one.
Pick the middle element M of the largest array A.
Use a binary search on the remaining arrays to find the same element (or the largest element <= M).
Based on the indexes of the various elements, calculate the total number of elements <= M and > M. This should give you two numbers: L, the number <= M and G, the number > M
If k < L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the bottom halves).
If k > L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the top halves, and search for element (k-L).

When you get to the point where you only have one element per array (or 0), make a new array of size n with those data, sort, and pick the kth element.

Because you're always guaranteed to remove at least half of one array, in N iterations, you'll get rid of half the elements. That means there are N log k iterations. Each iteration is of order N log k (due to the binary searches), so the whole thing is N^2 (log k)^2 That's all, of course, worst case, based on the assumption that you only get rid of half of the largest array, not of the other arrays. In practice, I imagine the typical performance would be quite a bit better than the worst case.

Don't you think it can be solved in N^2LogN by simple min heap. Take a heap of size N and put the minimum elements form the N arrays then pop one and check for the array. Insert the next element from the same array. Keep on doing this quit once you got you Kth element from the heap. — Trying, Jan 19 '14 at 04:40
http://www.geeksforgeeks.org/kth-smallest-element-in-a-row-wise-and-column-wise-sorted-2d-array-set-1/ This solution shows that it can be done in O(N + kLogN) time. — rdp, Nov 16 '14 at 04:31
Correct me if I'm wrong. But I was so confused at this line `That means there are N log k iterations`. To explain that to myself or anybody that gets confused as I did, it's because you get rid of half of the elements (which is N*K) in each N iterations, so to get down to N elements (either 1 or 0 element per array), you need `log((N * K) / N) = log(K)` times => in total it's `N * log(K)` — Leo, Feb 18 '18 at 08:54

score 2 · Answer 2 · answered Jan 06 '12 at 06:34

It can not be done in less than O(n) time. Proof Sketch If it did, it would have to completely not look at at least one array. Obviously, one array can arbitrarily change the value of the kth element.

I have a relatively simple O(n*log(n)*log(m)) where m is the length of the longest array. I'm sure it is possible to be slightly faster, but not a lot faster.

Consider the simple case where you have n arrays each of length 1. Obviously, this is isomorphic to finding the kth element in an unsorted list of length n. It is possible to find this in O(n), see Median of Medians algorithm, originally by Blum, Floyd, Pratt, Rivest and Tarjan, and no (asymptotically) faster algorithms are possible.

Now the problem is how to expand this to longer sorted arrays. Here is the algorithm: Find the median of each array. Sort the list of tuples (median,length of array/2) and sort it by median. Walk through keeping a sum of the lengths, until you reach a sum greater than k. You now have a pair of medians, such that you know the kth element is between them. Now for each median, we know if the kth is greater or less than it, so we can throw away half of each array. Repeat. Once the arrays are all one element long (or less), we use the selection algorithm.

Implementing this will reveal additional complexities and edge conditions, but nothing that increases the asymptotic complexity. Each step

Finds the medians or the arrays, O(1) each, so O(n) total
Sorts the medians O(n log n)
Walks through the sorted list O(n)
Slices the arrays O(1) each so, O(n) total

that is O(n) + O(n log n) + O(n) + O(n) = O(n log n). And, we must perform this untill the longest array is length 1, which will take log m steps for a total of O(n*log(n)*log(m))

You ask if this can be generalized to the case of unsorted arrays. Sadly, the answer is no. Consider the case where we only have one array, then the best algorithm will have to compare at least once with each element for a total of O(m). If there were a faster solution for n unsorted arrays, then we could implement selection by splitting our single array into n parts. Since we just proved selection is O(m), we are stuck.

This is just a specific case where k = n/2 i.e finding the median is equivalent to finding the n/2th smallest overall. This particular problem can be solved by finding the medians of each array (which is O(1) because the arrays are sorted. Then, finding the min and the max of these n medians O(n) time. Now the combined median will lie between the min and max median, so we can get rid of other elements. This is essentially O(log(maxM)) but I am not sure. In your case sorting the medians is bringing up the complexity a little bit when all we need would be a min and max. +1 for the effort though — Sushant, Jan 06 '12 at 06:51
Yah, going from finding the median to finding the kth is not hard. The problem with just picking the min and the max is you dont have bounds on how many values you are throwing away, unless I'm missing something. The sorting step lets you throw away half the values. Still, it would be nice to get down to `O(n*log m)` — Philip JF, Jan 06 '12 at 08:37
Note that my solution has the same asymptotic behavior as the one you linked to. Since in that case `n = 5` it was treated as a contant for the big-Oh calculation. — Philip JF, Jan 06 '12 at 08:42
With the knowledge that the median is bw min and max, you can certainly throw the lower half of the array corresponding to the min median, and upper half for the max median. — Sushant, Jan 07 '12 at 03:01

score 1 · Answer 3 · edited Apr 13 '17 at 12:32

1

There exist an generalization that solves the problem in O(N log k) time, see the question here.

edited Apr 13 '17 at 12:32

Community

1
1

answered Feb 09 '14 at 03:08

Chao Xu

2,156
2
22
31

score 1 · Answer 4 · answered Feb 10 '20 at 23:42

Old question, but none of the answers were good enough. So I am posting the solution using sliding window technique and heap:

class Node {

    int elementIndex;
    int arrayIndex;

    public Node(int elementIndex, int arrayIndex) {
        super();
        this.elementIndex = elementIndex;
        this.arrayIndex = arrayIndex;
    }

}

public class KthSmallestInMSortedArrays {

    public int findKthSmallest(List<Integer[]> lists, int k) {

        int ans = 0;
        PriorityQueue<Node> pq = new PriorityQueue<>((a, b) -> {
            return lists.get(a.arrayIndex)[a.elementIndex] -
                   lists.get(b.arrayIndex)[b.elementIndex];
        });

        for (int i = 0; i < lists.size(); i++) {
            Integer[] arr = lists.get(i);
            if (arr != null) {
                Node n = new Node(0, i);
                pq.add(n);
            }
        }

        int count = 0;

        while (!pq.isEmpty()) {
            Node curr = pq.poll();
            ans = lists.get(curr.arrayIndex)[curr.elementIndex];
            if (++count == k) {
                break;
            }

            curr.elementIndex++;
            pq.offer(curr);
        }

        return ans;
    }
}

The maximum number of elements that we need to access here is O(K) and there are M arrays. So the effective time complexity will be O(K*log(M)).

I think the code above has a bug. You need an if before incrementing and offering, as follows: if (curr.elementIndex < arrs[curr.arrayIndex].length - 1) { curr.elementIndex++; pq.offer(curr); } — Leandro, May 08 '21 at 09:10

score 1 · Answer 5 · answered May 08 '21 at 09:21

This would be the code. O(k*log(m))

public int findKSmallest(int[][] A, int k) {
        PriorityQueue<int[]> queue = new PriorityQueue<>(Comparator.comparingInt(x -> A[x[0]][x[1]]));
        for (int i = 0; i < A.length; i++)
            queue.offer(new int[] { i, 0 });

        int ans = 0;
        while (!queue.isEmpty() && --k >= 0) {
            int[] el = queue.poll();
            ans = A[el[0]][el[1]];
            if (el[1] < A[el[0]].length - 1) {
                el[1]++;
                queue.offer(el);
            }
        }

        return ans;
    }

score 1 · Answer 6 · edited May 23 '17 at 12:26

You could look at my recent answer on the related question here. The same idea can be generalized to multiple arrays instead of 2. In each iteration you could reject the second half of the array with the largest middle element if k is less than sum of mid indexes of all arrays. Alternately, you could reject the first half of the array with the smallest middle element if k is greater than sum of mid indexes of all arrays, adjust k. Keep doing this until you have all but one array reduced to 0 in length. The answer is kth element of the last array which wasn't stripped to 0 elements.

Run-time analysis:

You get rid of half of one array in each iteration. But to determine which array is going to be reduced, you spend time linear to the number of arrays. Assume each array is of the same length, the run time is going to be cclog(n), where c is the number of arrays and n is the length of each array.

score 0 · Answer 7 · answered Jan 05 '21 at 22:00

It can be done by doing binary search in each array, while calculating the number of smaller elements.

I used the bisect_left and bisect_right to make it work for non-unique numbers as well,

from bisect import bisect_left
from bisect import bisect_right

def kthOfPiles(givenPiles, k, count):
    '''
    Perform binary search for kth element in  multiple sorted list

    parameters
    ==========
    givenPiles  are list of sorted list
    count   is the total number of
    k       is the target index in range [0..count-1]
    '''
    begins = [0 for pile in givenPiles]
    ends = [len(pile) for pile in givenPiles]
    #print('finding k=', k, 'count=', count)
    
    for pileidx,pivotpile in enumerate(givenPiles):
        
        while begins[pileidx] < ends[pileidx]:
            mid = (begins[pileidx]+ends[pileidx])>>1
            midval = pivotpile[mid]
            
            smaller_count = 0
            smaller_right_count = 0
            for pile in givenPiles:
                smaller_count += bisect_left(pile,midval)
                smaller_right_count += bisect_right(pile,midval)
                
            #print('check midval', midval,smaller_count,k,smaller_right_count)
            if smaller_count <= k and k < smaller_right_count:
                return midval
            elif smaller_count > k:
                ends[pileidx] = mid
            else:
                begins[pileidx] = mid+1
            
    return -1

score 0 · Answer 8 · answered Jan 06 '12 at 05:23

0

If the k is not that huge, we can maintain a priority min queue. then loop for every head of the sorted array to get the smallest element and en-queue. when the size of the queue is k. we get the first k smallest .

maybe we can regard the n sorted array as buckets then try the bucket sort method.

answered Jan 06 '12 at 05:23

hxc

79
1
1
4

What is the complexity? In addition to using O(k) space, you will do atleast K enqueues, which is O(k log(k)). So, yep if K is huge we have a problem. I agree it might be a good solution when you need all the k smallest numbers, but in this case I just need the kth smallest. – Sushant Jan 06 '12 at 05:36

score 0 · Answer 9 · answered Jan 06 '12 at 06:25

This could be considered the second half of a merge sort. We could simply merge all the sorted lists into a single list...but only keep k elements in the combined lists from merge to merge. This has the advantage of only using O(k) space, but something slightly better than merge sort's O(n log n) complexity. That is, it should in practice operate slightly faster than a merge sort. Choosing the kth smallest from the final combined list is O(1). This is kind of complexity is not so bad.

Yes this is one of the trivial solutions to the problem, unfortunately it is not optimal enough. — Sushant, Jan 06 '12 at 06:40

score -1 · Answer 10 · answered Aug 11 '14 at 22:13

Please find the below C# code to Find the k-th Smallest Element in the Union of Two Sorted Arrays. Time Complexity : O(logk)

public int findKthElement(int k, int[] array1, int start1, int end1, int[] array2, int start2, int end2)
    {
        // if (k>m+n) exception
        if (k == 0)
        {
            return Math.Min(array1[start1], array2[start2]);
        }
        if (start1 == end1)
        {
            return array2[k];
        }
        if (start2 == end2)
        {
            return array1[k];
        }
        int mid = k / 2;
        int sub1 = Math.Min(mid, end1 - start1);
        int sub2 = Math.Min(mid, end2 - start2);
        if (array1[start1 + sub1] < array2[start2 + sub2])
        {
            return findKthElement(k - mid, array1, start1 + sub1, end1, array2, start2, end2);
        }
        else
        {
            return findKthElement(k - mid, array1, start1, end1, array2, start2 + sub2, end2);
        }
    }

Finding kth smallest number from n sorted arrays

10 Answers10

Linked

Related