Understanding the algorithm of "Median of Two Sorted Arrays"

Question

There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O (log (m+n)).

double findMedianSortedArrays(int A[], int m, int B[], int n) {
    return findMedianHelper2(A, m, B, n, max(0, (m-n)/2), min(m-1, (m+n)/2));
}

double findMedianHelper2(const int A[], const int m, const int B[], const int n, const int l, const int r) {
    if (l > r) return findMedianHelper2(B, n, A, m, max(0, (n-m)/2), min(n-1, (m+n)/2));

    int i = (l+r)/2;
    int j = (m+n)/2-i;

    assert(i >= 0 && i <= m && j >= 0 && j <= n);
    int Ai_1 = ((i == 0) ? INT_MIN : A[i-1]);
    int Bj_1 = ((j == 0) ? INT_MIN : B[j-1]);
    int Ai = ((i == m) ? INT_MAX : A[i]);
    int Bj = ((j == n) ? INT_MAX : B[j]);

    if (Ai < Bj_1) return findMedianHelper2(A, m, B, n, i+1, r);
    if (Ai > Bj) return findMedianHelper2(A, m, B, n, l, i-1);

    if (((m+n) % 2) == 1) return A[i];
    return (max(Ai_1, Bj_1) + Ai) / 2.0;
}

Question: what is the meaning of choose l = max(0, (m-n)/2) and r = min(m-1, (m+n)/2)

Thank you

You might get some clarification from [this question](http://stackoverflow.com/questions/12555793/finding-the-kth-smallest-element-in-union-of-sorted-arrays) that is a bit more general. — Daniel Fischer, Sep 25 '12 at 23:45

tgeng · Answer 1 · 2012-10-29T19:02:09.687

That code doesn't make sense for me. However, I think the key here is to make sure m>n and the values (m-n)/2 and (m+n)/2 are correctly passed into the helper function. Also, from the if statement at the beginning of the helper function, we can see that the intention is to fix things when m<n.

Assume m>0 and n>0 (They have to be so for the array to make sense.)
If m>n, then inside the helper, (l>r) will be false, and the algorithm should work well.
If m<n, then inside the helper, (l>r) will be false (unless m=1), and the "fix" seems not to fix anything at all.

Therefore, I assume the code has something wrong at the beginning.
However, the main part seems to make sense for me and indeed helped me with my implementation to do the same thing in JAVA.

Sorry, it's the first time I post in StackOverflow and I didn't realize I was typing html code. — tgeng, Oct 29 '12 at 19:04

score 0 · Answer 2 · answered Sep 25 '12 at 14:36

0

Question > what is the meaning of choose l = max(0, (m-n)/2) and r = min(m-1, (m+n)/2)

MAX and MIN are used to clamp the values so they cannot be below or above a constraint.

IF m - n < 0 THEN
    l = 0
ELSE l = (m - n) / 2

IF (m + n) / 2 > m - 1 THEN
    r = m -1
ELSE r = (m + n) / 2

answered Sep 25 '12 at 14:36

Louis Ricci

20,804
5
48
62

This is a good code interpretation but it doesn't solve my problem. I have problems to understand why we can apply this constraint in this algorithm. what is the theory behind this setting. -thx – q0987 Sep 25 '12 at 15:02
@q0987 - maybe you could output the values every iteration to the console or debug, so you can grasp how l and r are being used. l and r act as a corrective measure when n and m are not equal. This allows i and j to always be within the bounds of the arrays (hence the assertion). – Louis Ricci Sep 25 '12 at 19:19
In fact, I have gone through that steps before I post the question here. I can see how it work by debugging the code. But I don't know why we can make such a choice for l and r. – q0987 Sep 25 '12 at 19:45
@q0987 - You have 2 sorted lists, finding the median if they were combined into 1 long list would be easy, but that combining step would have to loop through them both (ruining the Log N time. So, instead you move through the lists making comparisons (Ai,Bj,Ai_1,Bj_1 to try and derive the center of the lists had they been one large list (instead of two smaller ones). – Louis Ricci Sep 25 '12 at 20:04

score 0 · Answer 3 · edited Apr 07 '13 at 14:37

First of all. Lets prove the algorithm for m=n case. Let name middle element as "k"

m1:=A[n/2]

m2:=B[n/1]`

if m1 < m2, so m1 < k < m2 otherwise m2 < k < m1.

proof: m1 < k so lets m2 < k but its not correct: "k" element index is higher than n obviously.Therefore m2 > k.

if m1 > k the same way we have m2 < k.

A and B merged middle element will be A/2 and B/2 merged middle element the same way. So we need to continue finding element in two arrays : A/2 and B/2 so go to 1) item until arrays become equal.

score 0 · Answer 4 · answered Jan 14 '14 at 01:49

0

http://leetcode.com/2011/03/median-of-two-sorted-arrays.html here is the details of analysis for this algorithm

answered Jan 14 '14 at 01:49

zangw

43,869
19
177
214

Ziyun · Answer 5 · 2014-09-20T06:41:05.753

The reason of choosing such left and right indices is to skip elements which can't be the median of the two sorted arrays.

Without loss of generality, we assume m > n. Then there are two edge cases:

Even if all the elements in B are smaller than A[0], the median is still not possible to be an element within A[0, ... , (m - n) / 2 - 1] because n + (m - n) / 2 - 1 < (m + n) / 2.
Similarly, even if all the elements in B are greater than A[m - 1], the median is still not possible to be an element within A[(m + n) / 2 + 1, ... , m - 1] because A[(m + n) / 2] must be the median.

Based on this observation, we only need to perform binary search on a subarray of the longer array in order to find the median.

And in the case of m < n, l = max(0, (m-n)/2) = 0 and r = min(m-1, (m+n)/2) = m - 1, which essentially means the median can possibly be any element within the shorter array.

Understanding the algorithm of "Median of Two Sorted Arrays"

5 Answers5