Find smallest subset sum matching another subset sum

Question

I have a real-world problem (not homework!) that requires finding the sum of a subset of set A that equals the sum of a subset of some other set B.

A very similar question with a helpful answer is here.

Consider this example:

@a = qw(200 2000 2000 2000 4000);
@b = qw(528 565 800 1435 2000 2000 2872);

Using the code provided in the accepted answer to that question, I get the following output:

sum(200 2000 4000) = sum(528 800 2000 2872)
sum(200 4000 4000) = sum(528 800 2000 2000 2872)
sum(200 4000) = sum(528 800 2872)
sum(200 2000 2000 2000 4000) = sum(528 800 2000 2000 2000 2872)

For my purposes, I only want the answer(s) that use the fewest elements of the input sets. In this example, I just want

sum(200 4000) = sum(528 800 2872)

because all of the other answers also have 200 and 4000 in the sum. That is, I'm looking for just the "simplest" possible sums (in the sense that they use the fewest elements). Can someone suggest a reasonably efficient way of doing this? (Brute force is okay as my arrays aren't that large.)

Also, I should note that the second line of the output, sum(200 4000 4000) ..., isn't correct because 4000 only appears once in @a. I'm afraid I don't understand the algorithm well enough to see why this is happening.

Any help with either of these would be much appreciated!

? Why isn't `sum(2000) = sum(2000)` the correct answer? Why isn't `sum(4000)=sum(2000 2000)` a correct answer? — mob, Jun 22 '11 at 18:10
@mob - you're right, they should be. The algorithm given in the other question doesn't seem to be quite right. (Perhaps it was right for that question, but I don't see the difference.) Your two answers would also be correct, and I'd like the algorithm to deliver those as well. Thanks. — itzy, Jun 22 '11 at 18:13
@itzy: Thanks for noticing the problems with my code again! I hope your problem was resolved to your satisfaction. Dynamic programming similar to that in my answer (except correct ...) should have resolved your problem. Take care! — A. Rex, Aug 21 '11 at 17:25

Mikola · Accepted Answer · 2011-06-22T19:41:17.943

3

This problem is NP-complete, so unless P=NP you are stuck doing exponential work on the size of the input. Now the neat thing about this type of problem is that there are actually two ways to solve which will put the exponent on different aspects of the problem.

First if your sums don't have too many elements you can just brute force this problem by searching over all combinations. This method is exponential on the number of elements in the sets, and will work reasonably well up to say 20 or so elements per container. After that it is going to get pretty nasty.

A second option is to use dynamic programming. Unlike the previous method, this algorithm is exponential in the number of bits that it takes to write out each of the numbers. What you do is keep track of the set of all possible sums as well as the smallest number of elements required to generate them. This is a simple modification of the solution that you linked to in your answer.

Here is some python code that generates all the possible values they can intersect at:

    def gen_sum(a):
        r = { 0: [] }
        for x in a:
            for y, v in r.items():
                if not (x+y) in r or len(r[x+y]) > len(v) + 1:
                    r[x+y] = v + [x]
        return r

    def gen_sums(a, b):
        asum = gen_sum(a)
        bsum = gen_sum(b)
        return [ (k, v, bsum[k]) for k,v in asum.items() if k in bsum ][1:]

Running it on your sample data, I got:

[(4000, [4000], [2000, 2000]), (6000, [2000, 4000], [565, 1435, 2000, 2000]), (2000, [2000], [2000]), (4200, [200, 4000], [528, 800, 2872]), (10200, [200, 2000, 2000, 2000, 4000], [528, 565, 800, 1435, 2000, 2000, 2872]), (8200, [200, 2000, 2000, 4000], [528, 800, 2000, 2000, 2872]), (6200, [200, 2000, 4000], [528, 800, 2000, 2872])]

EDIT: Fixed a typo, and also just noticed that holy crap tons of people have already answered this really quick.

edited Jun 22 '11 at 19:41

answered Jun 22 '11 at 18:36

Mikola

9,176
2
34
41

You can keep the minimum only, you don't need all the sets - just the minimum. At bucket k, you don't need to know the actual values of the items in the set, this is the reason the psuedo-poly algorithm works at all. In the worst case, keeping all the sets can lead to exponential time and space. – dfb Jun 22 '11 at 19:08
More precisely, by keeping all the sets , it's not psuedo-poly in the worst case, if you just keep the minimum, it's still psuedo-poly – dfb Jun 22 '11 at 19:11
@spinning_plate: How do you reckon? This only uses O(n * (|a|+|b|)) space, where n is the size of the largest possible sum and |a|, |b| are the lengths of a and b respectively. This should still be psuedo-polynomial on n. And as for time complexity, it iterates the array once per element, so the time complexity should be no greater than O(n * (|a| + |b|)^2), which is again psuedo-poly. – Mikola Jun 22 '11 at 19:14
And as an addendum, if you look at it closely you will see that it only keeps track of the minimum set, so again I am not quite sure what you mean. – Mikola Jun 22 '11 at 19:21
Thanks for the code. My Python is even worse than my Perl, but I'm going to try to port this to Perl. If this would be easy for you I'd really appreciate the help! But thanks much regardless. – itzy Jun 22 '11 at 19:41
@Mikola - It looks like your output is missing (2000, [2000], [565, 1435]). Any idea why? Thanks. – itzy Jun 22 '11 at 20:49
@itzy: The solution for 2000 is there, except the correct answer should be (2000, [2000], [2000]), since the set [2000] has fewer elements than [565, 1435]. – Mikola Jun 22 '11 at 23:13

dfb · Answer 2 · 2011-06-22T19:51:36.827

1

You need to modify the recurrence relation, not just the output. Consider {1,2,20,23,42} and {45}. The original algorithm will output {1,2,42},{45} instead of {20,23},{45}. This is because 42 is considered last, and when it sums to 45, it will overwrite the bucket value at 45 previously containing {20,23}

Instead of keeping [set,sum] for each value, you need to keep [ minimum set, sum ] and then take the minimums at the end.

My perl is awful, but something like this

$a{$m+$a} = [min-length(@{$a{$m}},$a{$m+$a}[0]),$a];

where min-length returns the smaller set

edited Jun 22 '11 at 19:51

answered Jun 22 '11 at 18:21

dfb

13,133
2
31
52

Did you mean {1,2,20,23,42} and {43}? – itzy Jun 22 '11 at 19:49

score 1 · Answer 3 · answered Jun 22 '11 at 18:30

1

Here's an updated algorithm which gives all the sums:

my @a = qw(200 2000 2000 2000 4000);
my @b = qw(528 565 800 1435 2000 2000 2872);

my %a = sums( @a );
my %b = sums( @b );

for my $m ( keys %a ) {
    print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n" if exists $b{$m};
}

sub sums {
    my( @a ) = @_;

    return () unless @a;

    my %a;

    while( @a ) {
        $a = shift @a;

        for my $m ( keys %a ) {
            $a{$m+$a} = [@{$a{$m}},$a];
        }

        $a{$a} = [$a];
    }

    return %a;
}

Al you have to do is find the shortest one(s), but others have covered that already :)

HTH,

Paul

answered Jun 22 '11 at 18:30

pavel

3,488
1
20
20

Thanks for the help. It looks like this code is still giving an sum(200 4000 4000) as one of the answers, but 4000 only appears once in the input set. Is 2000+2000 is getting saved as 4000 somehow? – itzy Jun 22 '11 at 19:39
@irzy: you're right; that's because there's 2 x 2000 in the list... Not sure how to handle that for now :( – pavel Jun 22 '11 at 19:50

score 0 · Answer 4 · answered Jun 22 '11 at 17:57

I'm not great with perl. But,

for $m ( keys %a ) {
print "sum(@{$a{$m}}) = sum(@{$b{$m}})\n" if exists $b{$m};

}

Modify this line to count the number of elements in set $a and $b for each $m. When you've finished looping through all of them, select the one with the least number of elements.

Find smallest subset sum matching another subset sum

4 Answers4

Linked