k-greatest double selection

Question

Imagine you have two sacks (A and B) with N and M balls respectively in it. Each ball with a known numeric value (profit). You are asked to extract (with replacement) the pair of balls with the maximum total profit (given by the multiplication of the selected balls).

The best extraction is obvious: Select the greatest valued ball from A as well as from B.

The problem comes when you are asked to give the 2nd or kth best selection. Following the previous approach you should select the greatest valued balls from A and B without repeating selections.

This can be clumsily solved calculating the value of every possible selection, ordering and ordering it (example in python):

def solution(A,B,K):
    if K < 1:
        return 0
    pool = []
    for a in A:
        for b in B:
            pool.append(a*b)
    pool.sort(reverse=True)
    if K>len(pool):
        return 0
    return pool[K-1]

This works but its worst time complexity is O(N*M*Log(M*M)) and I bet there are better solutions.

I reached a solution based on a table where A and B elements are sorted from higher value to lower and each of these values has associated an index representing the next value to test from the other column. Initially this table would look like:

enter image description here

The first element from A is 25 and it has to be tested (index 2 select from b = 0) against 20 so 25*20=500 is the first greatest selection and, after increasing the indexes to check, the table changes to:

enter image description here

Using these indexes we have a swift way to get the best selection candidates:

25 * 20 = 500 #first from A and second from B
20 * 20 = 400 #second from A and first from B

I tried to code this solution:

def solution(A,B,K):
    if K < 1:
        return 0
    sa = sorted(A,reverse=true)
    sb = sorted(B,reverse=true)

    for k in xrange(K):
        i = xfrom
        j = yfrom
        if i >= n and j >= n:
                ret = 0
                break
        best = None
        while i < n and j < n:
                selected = False
                #From left
                nexti = i
                nextj = sa[i][1]
                a = sa[nexti][0]
                b = sb[nextj][0]
                if best is None or best[2]<a*b:
                        selected = True
                        best = [nexti,nextj,a*b,'l']
                #From right
                nexti = sb[j][1]
                nextj = j
                a = sa[nexti][0]
                b = sb[nextj][0]
                if best is None or best[2]<a*b:
                        selected = True
                        best = [nexti,nextj,a*b,'r']
                #Keep looking?
                if not selected or abs(best[0]-best[1])<2:
                        break
                i = min(best[:2])+1
                j = i
                print("Continue with: ", best, selected,i,j)
        #go,go,go
        print(best)
        if best[3] == 'l':
            dx[best[0]][1] = best[1]+1
            dy[best[1]][1] += 1
        else:
            dx[best[0]][1] += 1
            dy[best[1]][1] = best[0]+1
        if dx[best[0]][1]>= n:
            xfrom = best[0]+1
        if dy[best[1]][1]>= n:
            yfrom = best[1]+1
        ret = best[2]

    return ret

But it did not work for the on-line Codility judge (Did I mention this is part of the solution to an, already expired, Codility challenge? Sillicium 2014)

My questions are:

Is the second approach an unfinished good solution? If that is the case, any clue on what I may be missing?
Do you know any better approach for the problem?

Did you read this post: ["How to find pair with kth largest sum?"](http://stackoverflow.com/q/18557175/1009831). I think it answers your second question. — Evgeny Kluev, Aug 13 '14 at 08:23
The problem statement is a bit unclear. Can you clarify what you mean by kth selection? Since the extraction is with replacement, we can continue extracting the same maximum valued balls. — Abhishek Bansal, Aug 13 '14 at 08:47
@user1990169 Consider it not as the kth selection but as the kth best selection — Pablo Francisco Pérez Hidalgo, Aug 13 '14 at 08:57
Take logarithms, then apply the solution in the linked post. — user2357112, Aug 13 '14 at 09:19
@user2357112 As long as the profit values are positive the best selection pair for summation will be the same than for multiplication. The explanation for that is your solutiuon suggestion: `Log(X*Y) = Log(X)+Log(Y)` So no need to take logarithms, just apply the linked solution. — Pablo Francisco Pérez Hidalgo, Aug 14 '14 at 09:57

score 2 · Accepted Answer · edited May 23 '17 at 10:33

You need to maintain a priority queue.

You start with (sa[0], sb[0]), then move onto (sa[0], sb[1]) and (sa[1], sb[0]). If (sa[0] * sb[1]) > (sa[1] * sb[0]), can we say anything about the comparative sizes of (sa[0], sb[2]) and (sa[1], sb[0])?

The answer is no. Thus we must maintain a priority queue, and after removing each (sa[i], sb[j]) (such that sa[i] * sb[j] is the biggest in the queue), we must add to the priority queue (sa[i - 1], sb[j]) and (sa[i], sb[j - 1]), and repeat this k times.

Incidentally, I gave this algorithm as an answer to a different question. The algorithm may seem to be different at first, but essentially it's solving the same problem.

score 1 · Answer 2 · edited May 23 '17 at 12:18

I'm not sure I understand the "with replacement" bit...

...but assuming this is in fact the same as "How to find pair with kth largest sum?", then the key to the solution is to consider the matrix S of all the sums (or products, in your case), constructed from A and B (once they are sorted) -- this paper (referenced by @EvgenyKluev) gives this clue.

(You want A*B rather than A+B... but the answer is the same -- though negative numbers complicate but (I think) do not invalidate the approach.)

An example shows what is going on:

  for A = (2, 3, 5, 8, 13)
  and B = (4, 8, 12, 16)

we have the (notional) array S, where S[r, c] = A[r] + B[c], in this case:

   6 ( 2+4),  10 ( 2+8),  14 ( 2+12),  18 ( 2+16)
   7 ( 3+4),  11 ( 3+8),  15 ( 3+12),  19 ( 3+16)
   9 ( 5+4),  13 ( 5+8),  17 ( 5+12),  21 ( 5+16)
  12 ( 8+4),  16 ( 8+8),  20 ( 8+12),  14 ( 8+16)
  17 (13+4),  21 (13+8),  25 (13+12),  29 (13+16)

(As the referenced paper points out, we don't need to construct the array S, we can generate the value of an item in S if or when we need it.)

The really interesting thing is that each column of S contains values in ascending order (of course), so we can extract the values from S in descending order by doing a merge of the columns (reading from the bottom).

Of course, merging the columns can be done using a priority queue (heap) -- hence the max-heap solution. The simplest approach being to start the heap with the bottom row of S, marking each heap item with the column it came from. Then pop the top of the heap, and push the next item from the same column as the one just popped, until you pop the kth item. (Since the bottom row is sorted, it is a trivial matter to seed the heap with it.)

The complexity of this is O(k log n) -- where 'n' is the number of columns. The procedure works equally well if you process the rows... so if there are 'm' rows and 'n' columns, you can choose the smaller of the two !

NB: the complexity is not O(k log k)... and since for a given pair of A and B the 'n' is constant, O(k log n) is really O(k) !!

If you want to do many probes for different 'k', then the trick might be to cache the state of the process every now and then, so that future 'k's can be done by restarting from the nearest check-point. In the limit, one would run the merge to completion and store all possible values, for O(1) lookup !

+1 For your great explanation and for had pointed the fact I commented above that as long as the numbers are positive the best selection for summation is also for multiplication. — Pablo Francisco Pérez Hidalgo, Aug 15 '14 at 07:44

k-greatest double selection

2 Answers2

Linked