15

Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. A pair is one element from the first array and one element from the second array. For example, with arrays

  • [2, 3, 5, 8, 13]
  • [4, 8, 12, 16]

The pairs with largest sums are:

  1. 13 + 16 = 29
  2. 13 + 12 = 25
  3. 8 + 16 = 24
  4. 13 + 8 = 21
  5. 8 + 12 = 20

So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?

I anticipate the solution may involve either a min heap or a max heap.

Colonel Panic
  • 132,665
  • 89
  • 401
  • 465
TimeToCodeTheRoad
  • 7,032
  • 16
  • 57
  • 70

3 Answers3

27

That can be easily done in O(k*logk). I'll only assume arrays are sorted in descending order, to simplify notation.

The idea is simple. We'll find 1st, 2nd, .., k-th maximum values one by one. But to even consider pair (i, j) we need to have both (i-1, j) and (i, j-1) already selected, because they both are greater or equal than (i, j).

It's much like if we push all n*m pairs into the heap and then remove max k times. Only we don't need all n*m pairs.

heap.add(pair(0, 0));  // biggest pair

// remove max k-1 times
for (int i = 0; i < k - 1; ++i) {
    // get max and remove it from the heap
    max = heap.popMax();

    // add next candidates
    heap.add(pair(max.i + 1, max.j));
    heap.add(pair(max.i, max.j + 1));
}

// get k-th maximum element
max = heap.popMax();
maxVal = a[max.i] + b[max.j];

Some things to consider.

  • Duplicated pairs can be added to the heap, this can be prevented with hash.
  • Indexes need to be validated, e.g. that max.i + 1 < a.length.
Nikita Rybak
  • 67,365
  • 22
  • 157
  • 181
  • @Nikita: thanks for the effort! I was wondering why do you not consider i+1 and the biggest element in the second array?. I mean why do max.i+1 , max.j, instead of max.i+1 and the first element in the second array. – TimeToCodeTheRoad Mar 07 '11 at 08:32
  • @TimeToCodeTheRoad (i+1, 0) was already added to the heap when we processed (i, 0). And we already processed (i, 0) because `a[i]+b[0] >= a[i] + b[j]` (Well, we might not have, if they are equal, but you can see that it changes nothing in terms of correctness.) – Nikita Rybak Mar 07 '11 at 08:43
  • @Nikita: Thanks again! also, why dont we consider the first element of array A and j+1 element of the second array? I really appreciate your help. Pls help me – TimeToCodeTheRoad Mar 07 '11 at 08:45
  • @Nikita: I want to award you more points than just selecting ur answer. Is there any way? :) – TimeToCodeTheRoad Mar 07 '11 at 08:55
  • @TimeToCodeTheRoad It's the same situation as in your first comment, only i and j switched. Same reasoning applies. – Nikita Rybak Mar 07 '11 at 09:17
  • @TimeToCodeTheRoad 500 bounty? :) – Nikita Rybak Mar 07 '11 at 09:18
  • @Nikita: Lol. I dont have that much reputation, as bounty points are funded by reputation. Anyways, i guess it feels good to know that you really helped someone ;). thanks so much – TimeToCodeTheRoad Mar 07 '11 at 09:23
  • @TimeToCodeTheRoad I was only kidding, sorry if I scared you :) – Nikita Rybak Mar 07 '11 at 09:29
  • @Nikita: :). Btw i really like your solution :) – TimeToCodeTheRoad Mar 07 '11 at 09:33
  • @Nikita: The solution is cool. I am wondering whether it would work fine if all elements of Array1 are greater than all elements of Array2 and both the arrays are sorted in descending order. – Arnkrishn Dec 12 '11 at 05:07
  • I think this is the best solution. However, the complexity is O(K Log(K)), not O(K), right? Since the heap size may grow to O(K) and inserting a new element takes O(Log(K)) time. – Jim Huang Feb 23 '13 at 21:33
  • @JimHuang You have a point, thanks! Some heaps have constant insertion time (e.g. fibonacci heap), but overall you seem to be correct. – Nikita Rybak Feb 23 '13 at 23:04
  • I am wondering if there some linear time solution to this problem. – AlienOnEarth Jun 11 '14 at 01:13
1

I understand you want a heap but that isn't the most efficient solution, as phimuemue pointed out.

You can max_heap both arrays, and set an iterator at the root for both. At this point, you have the largest sum.

Now, at each step, find the maximum, unvisited node among the children and neighbors of both pointers -- this is the next largest sum. Move the pointer in the corresponding heap there.

Repeat k times.

Vanwaril
  • 7,380
  • 6
  • 33
  • 47
  • Is the statement "at each step, take the pointer to the smaller of the two values and traverse it to the maximum of its children and neighbors" really sound? Take this for example, heap1 [7, 1, 2], heap2 [8, 6, 7]. Without doubt the lagest sum is 7(from heap1) + 8(from heap2). But the second lagest is 7(from heap1) + 7(from heap2) instead of 2(from heap1) + 8(from heap2). Regards! – Summer_More_More_Tea Mar 07 '11 at 01:15
  • Ah, you're right. Edited my answer – Vanwaril Mar 07 '11 at 01:25
  • Hard to understand what you are doing!! Would it be possible to show some pseudo-code for better clarity. – faizan Apr 20 '17 at 06:29
1

Here is my answer, I think it's working well, but could someone tell me what's its complexity ?

Thanks

int ksum( int a[], int n, int b[], int m, int maxk )
{

    std::vector<int> results;
    int* check = new int[m*n];
    memset( check, 0, n*m*sizeof(int) );

    int finali, finalj;
    int starti = 0, startj = 0;
    for( int k=1; k<maxk+1; ++k )
    {
        int max = 0;
        int maxi=n, maxj=m;
        for( int i=starti; i<n && i<k; ++i )
        {
            if( i>maxj )
                break;
            for( int j=i; j<m && j<k; ++j )
            {
                if( i>maxi && j>=maxj )
                    break;
                if( check[i*m+j] == 0 )
                {
                    int tmp = a[i]+b[j];
                    if( tmp>max )
                    {
                        max = tmp;
                        finali = i, finalj = j;
                    }
                    maxi = i, maxj = j;
                    break;
                }
            }
        }
        starti = finali;

        maxi=n, maxj=m;
        for( int j=startj; j<n && j<k; ++j )
        {
            if( j>maxi )
                break;
            for( int i=j; i<m && i<k; ++i )
            {
                if( j>maxj && i>=maxi )
                    break;
                if( check[i*m+j] == 0 )
                {
                    int tmp = a[i]+b[j];
                    if( tmp>max )
                    {
                        max = tmp;
                        finali = i, finalj = j;
                    }
                    maxi = i, maxj = j;
                    break;
                }
            }
        }
        startj = finalj;

        if( max > 0 )
        {
            check[finali*m+finalj] = 1;
            results.push_back( max );
        }
    }

    delete[] check;
    if( maxk > results.size() )
        return 0;
    else
        return results[maxk-1];
}

int _tmain(int argc, _TCHAR* argv[])
{
    int a[] = {10,8,6,4,1};
    int b[] = {9,6,3,2,1};
    int res = ksum( a, 5, b, 5, 9 );
    return 0;
}
Ethan Cabiac
  • 4,943
  • 20
  • 36
Vince
  • 11
  • 2