10

Recently I was asked the following interview question: You have two sets of numbers of the same length N, for example A = [3, 5, 9] and B = [7, 5, 1]. Next, for each position i in range 0..N-1, you can pick either number A[i] or B[i], so at the end you will have another array C of length N which consists in elements from A and B. If sum of all elements in C is less than or equal to K, then such array is good. Please write an algorithm to figure out the total number of good arrays by given arrays A, B and number K.

The only solution I've come up is Dynamic Programming approach, when we have a matrix of size NxK and M[i][j] represents how many combinations could we have for number X[i] if current sum is equal to j. But looks like they expected me to come up with a formula. Could you please help me with that? At least what direction should I look for? Will appreciate any help. Thanks.

4 Answers4

9

After some consideration, I believe this is an NP-complete problem. Consider:

A = [0, 0, 0, ..., 0]
B = [b1, b2, b3, ..., bn]

Note that every construction of the third set C = ( A[i] or B[i] for i = 0..n ) is is just the union of some subset of A and some subset of B. In this case, since every subset of A sums to 0, the sum of C is the same as the sum of some subset of B.

Now your question "How many ways can we construct C with a sum less than K?" can be restated as "How many subsets of B sum to less than K?". Solving this problem for K = 1 and K = 0 yields the solution to the subset sum problem for B (the difference between the two solutions is the number of subsets that sum to 0).

By similar argument, even in the general case where A contains nonzero elements, we can construct an array S = [b1-a1, b2-a2, b3-a3, ..., bn-an], and the question becomes "How many subsets of S sum to less than K - sum(A)?"

Since the subset sum problem is NP-complete, this problem must be also. So with that in mind, I would venture that the dynamic programming solution you proposed is the best you can do, and certainly no magic formula exists.

verdesmarald
  • 11,646
  • 2
  • 44
  • 60
  • [This question](http://stackoverflow.com/questions/376003/is-this-variant-of-the-subset-sum-problem-easier-to-solve) also seems somewhat related. – verdesmarald Oct 03 '12 at 01:31
  • Seems like you beat me to the punch! This case (A = 0) is equivalent to #Knapsack, that is, counting the number of feasible solutions to a knapsack instance, and it is known to be #P-complete. – ffao Oct 03 '12 at 01:33
  • you misunderstood the task: "another array C of length N" – Karoly Horvath Oct 03 '12 at 01:33
  • 1
    @KarolyHorvath No, I didn't. You construct `C` by taking any subset of `B`, and the requisite number of `0`s from `A`. Obviously the `0`s do not change the sum of `C`, so the question is indeed equivalent to "How many subsets of `B` sum to less than `K`?" – verdesmarald Oct 03 '12 at 01:35
  • It'd be helpful to explain why 'any subset of B' as per the knapsack problem is an equivalent problem to 'exactly N items, each individual one a choice of two'. – Tommy Oct 03 '12 at 01:38
  • @Tommy You are right, I added some explanation about why each construction of `C` is equivalent to a subset of `B`. I hope this is clearer. – verdesmarald Oct 03 '12 at 01:48
  • 5
    further example from what verdesmarald said `A = [3, 5, 7]`, `B = [7, 5, 1]`, `K = 14` can be simplified to `A = [0, 0, 0]`, `B = [4, 0, -6]`, `K = -1` – Neverever Oct 03 '12 at 02:00
  • @verdesmarald my comment's been deleted because it was obviously in error. – Tommy Oct 03 '12 at 02:01
  • For the record, with a little thought I'd have explained the equivalence like this: the decision you make at each stage is to use the B value or not to use the B value. If you start from the assumption that you'll be using all Bs then deciding to use a B has no net effect. Deciding not to use one adjusts your total by A[i] - B[i]. So picking combinations from B' after assuming all Bs is equivalent to picking As or Bs with no assumption. – Tommy Oct 03 '12 at 02:30
0

So there's 2^N choices, since at each point you either pick from A or from B. In the specific example you give where N happens to be 3 there are 8. For discussion you can characterise each set of decisions as a bit pattern.

So as a brute-force approach would try every single bit pattern.

But what should be obvious is that if the first few bits produce a number too large then every subsequent possible group of tail bits will also produce a number that is too large. So probably a better way to model it is a tree where you don't bother walking down the limbs that have already grown beyond your limit.

You can also compute the maximum totals that can be reached from each bit to the end of the table. If at any point your running total plus the maximum that you can obtain from here on down is less than K then every subtree from where you are is acceptable without any need for traversal. The case, as discussed in the comments, where every single combination is acceptable is a special case of this observation.

As pointed out by Serge below, a related observation is to us minimums and use the converse logic to cancel whole subtrees without traversal.

A potential further optimisation rests behind the observation that, as long as we shuffle each in the same way, changing the order of A and B has no effect because addition is commutative. You can therefore make an effort to ensure either that the maximums grow as quickly as possible or the minimums grow as slowly as possible, to try to get the earliest possible exit from traversal. In practice you'd probably want to apply a heuristic comparing the absolute maximum and minimum (both of which you've computed anyway) to K.

That being the case, a recursive implementation is easiest, e.g. (in C)

/* assume A, B and N are known globals */

unsigned int numberOfGoodArraysFromBit(
           unsigned int bit,
           unsigned int runningTotal,
           unsigned int limit)
{
    // have we ended up in an unacceptable subtree?
    if(runningTotal > limit) return 0;

    // have we reached the leaf node without at any
    // point finding this subtree to be unacceptable?
    if(bit >= N) return 1;

    // maybe every subtree is acceptable?
    if(runningTotal + MAXV[bit] <= limit)
    {
        return 1 << (N - bit);
    }

    // maybe no subtrees are acceptable?
    if(runningTotal + MINV[bit] > limit)
    {
        return 0;
    }

    // if we can't prima facie judge the subtreees,
    // we'll need specifically to evaluate them
    return
       numberOfGoodArraysFromBit(bit+1, runningTotal+A[bit], limit) +
       numberOfGoodArraysFromBit(bit+1, runningTotal+B[bit], limit);
}

// work out the minimum and maximum values at each position
for(int i = 0; i < N; i++)
{
    MAXV[i] = MAX(A[i], B[i]);
    MINV[i] = MIN(A[i], B[i]);
}

// hence work out the cumulative totals from right to left
for(int i = N-2; i >= 0; i--)
{
    MAXV[i] += MAXV[i+1];
    MINV[i] += MINV[i+1];
}

// to kick it off
printf("Total valid combinations is %u", numberOfGoodArraysFromBit(0, 0, K));

I'm just thinking extemporaneously; it's likely better solutions exist.

Tommy
  • 99,986
  • 12
  • 185
  • 204
  • I don't think this is a very good approach, in the case where every subset is good you will traverse the entire tree giving `O(2^n)` running time. – verdesmarald Oct 03 '12 at 01:02
  • yes, I agree, if we first calculate a sum of array of max(A[i],B[I]) then we may filter out this only case. May be we may find a way to make it O(2^(n-1)) at worst? – Serge Oct 03 '12 at 01:04
  • The best optimisation in that case would be to produce C, of length N, with C[i] = MAX(A[i], B[i]) and D where D[i] = sum of all C[j], j >= i. If at any point the running total + D[i] is less than the limit then you can add the entire subtree to the acceptable set. I've amended my answer (albeit with C meaning something slightly different per the programming world of intermediate results and assignments). – Tommy Oct 03 '12 at 01:14
  • There is one more optimization - in parallel we build the same array with minimums. As soon as we get the running total plus the known minimum in front of us exceeds the K, we may cancel the entire subtree – Serge Oct 03 '12 at 01:27
  • @Serge yep. Good point. I'll made the edit, and flag this as community wiki since it's now a combination of our answers rather than merely mine. – Tommy Oct 03 '12 at 01:30
  • yes, I posted my answer a few seconds after yours, and was really surprised to see the same code :) – Serge Oct 03 '12 at 01:31
  • @Tommy it came to my mind today: if we apply the ideas in the accepted answer (not to optimize as the complexity will not change as I understand, but just to simplify the further things), then we sort the resulting array in descending order and after that we do our search with our two cut-off conditions, will it make any sense? – Serge Oct 04 '12 at 22:14
  • @Serge NP-complete is a wide class because there are lots of polynomials so I don't take the accepted answer to mean that there's no purpose in optimising, though here you'd probably want to pick heuristics based on the usual input set. If in practice you're rejecting 95% of input sets then arranging things so that a quick exit via the maximum constraint earlier would likely dramatically reduce your real world costs. So, ummm, that could be a good idea and it could be a bad idea. It'd definitely be a smart thing to talk about at an interview. – Tommy Oct 04 '12 at 23:40
0

" Please write an algorithm to figure out the total number of good arrays by given arrays A, B and number K."

Is it not the goal?

int A[];
int B[];
int N;
int K;
int Solutions = 0;

void FindSolutons(int Depth, int theSumSoFar) {
    if (theSumSoFar > K) return;
    if (Depth >= N) {
    Solutions++;
    return;
    }

    FindSolutions(Depth+1,theSumSoFar+A[Depth]);
    FindSolutions(Depth+1,theSumSoFar+B[Depth]);
}

Invoke FindSolutions with both arguments set to zero. On return the Solutions will be equal to the number of good arrays;

Serge
  • 6,088
  • 17
  • 27
0

this is how i would try to solve the problem (Sorry if its stupid)

think of arrays

A=[3,5,9,8,2]
B=[7,5,1,8,2]

if elements 0..N-1

number of choices

2^N

C1=0,C2=0

for all A[i]=B[i] 
{
    C1++
    C2+=A[i]+B[i]
}

then create new two arrays like

A1=[3,5,9]
B1=[7,5,1]

also now C2 is 10

now number of all choices are reduced to 2^(N-C1)

now calculate all good numbers

using 'K' as K=K-C2

unfortunately no matter what method you use, you have to calculate sum 2^(N-C1) times

bhathiya-perera
  • 1,303
  • 14
  • 32