Subset Sum floats Elimations

Question

I will be happy to get some help. I have the following problem: I'm given a list of numbers and a target number.

subset_sum([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20)

I need to find an algorithm that will find all numbers that combined will sum target number ex: 20. First find all int equal 20 And next for example the best combinations here are:

11.96 + 8.04

1 + 10 + 9

11.13 + 7.8 + 1.07

9 + 11

Remaining value 15.04.

I need an algorithm that uses 1 value only once and it could use from 1 to n values to sum target number.

I tried some recursion in PHP but runs out of memory really fast (50k values) so a solution in Python will help (time/memory wise). I'd be glad for some guidance here.

One possible solution is this: Finding all possible combinations of numbers to reach a given sum

The only difference is that I need to put a flag on elements already used so it won't be used twice and I can reduce the number of possible combinations

Thanks for anyone willing to help.

Do an internet search for "Dynamic programming subset sum" and have fun ;) — sberry, Jun 30 '16 at 14:34
Already did. None of the solutions out there helps me. Could you provide some links? — Gabriel Nicola, Jun 30 '16 at 14:40

score 0 · Answer 1 · answered Jun 30 '16 at 14:53

there are many ways to think about this problem. If you do recursion make sure to identify your end cases first, then proceed with the rest of the program.

This is the first thing that comes to mind.

<?php
subset_sum([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20);


function subset_sum($a,$s,$c = array())
{
        if($s<0)
                return;
        if($s!=0&&count($a)==0)
                return;
        if($s!=0)
        {
                foreach($a as $xd=>$xdd)
                {
                        unset($a[$xd]);
                        subset_sum($a,$s-$xdd,array_merge($c,array($xdd)));
                }
        }
        else
                print_r($c);

}
?>

also, add ini_set('memory_limit','2048M'); if you are running out of memory. — Dimi, Jun 30 '16 at 20:12

score 0 · Answer 2 · answered Jun 30 '16 at 14:56

0

This is possible solution, but it's not pretty:

import itertools
import operator
from functools import reduce

def subset_num(array, num):
    subsets = reduce(operator.add, [list(itertools.combinations(array, r)) for r in range(1, 1 + len(array))])
    return [subset for subset in subsets if sum(subset) == num]


print(subset_num([11.96,1,15.04,7.8,20,10,11.13,9,11,1.07,8.04,9], 20))

Output:

[(20,), (11.96, 8.04), (9, 11), (11, 9), (1, 10, 9), (1, 10, 9), (7.8, 11.13, 1.07)]

answered Jun 30 '16 at 14:56

RoadRunner

25,803
6
42
75

Tried your solution with a larger set of data: Segmentation fault (core dumped) – Gabriel Nicola Jun 30 '16 at 15:55
Segmentation fault in Python? What does you're error say? – RoadRunner Jun 30 '16 at 16:03
Traceback (most recent call last): File "test1.py", line 11, in 58,20,259,20,260,20,261,2.36,262,11.41,...], 20)) File "test1.py", line 6, in subset_num subsets = reduce(operator.add, [list(itertools.combinations(array, r)) for r in range(1, 1 + len(array))]) MemoryError – Gabriel Nicola Jun 30 '16 at 16:15

et_l · Answer 3 · 2016-07-01T01:56:26.787

DISCLAIMER: this is not a full solution, it is a way to just help you build the possible subsets. It does not help you to pick which ones go together (without using the same item more than once and getting the lowest remainder).

Using dynamic programming you can build all the subsets that add up to the given sum, then you will need to go through them and find which combination of subsets is best for you.

To build this archive you can (I'm assuming we're dealing with non-negative numbers only) put the items in a column, go from top to bottom and for each element compute all the subsets that add up to the sum or a lower number than it and that include only items from the column that are in the place you are looking at or higher. When you build a subset you put in its node both the sum of the subset (which may be the given sum or smaller) and the items that are included in the subset. So in order to compute the subsets for an item [i] you need only look at the subsets you've created for item [i-1]. For each of them there are 3 options:

1) the subset's sum is the given sum ---> Keep the subset as it is and move to the next one.

2) the subset's sum is smaller than the given sum but larger than it if item [i] is added to it ---> Keep the subset as it is and move on to the next one.

3) the subset's sum is smaller than the given sum and it will still be smaller or equal to it if item [i] is added to it ---> Keep one copy of the subset as it is and create another one with item [i] added to it (both as a member and added to the sum of the subset).

When you're done with the last item (item [n]), look at the subsets you've created - each one has its sum in its node and you can see which ones are equal to the given sum (and which ones are smaller - you don't need those anymore).

As I wrote at the beginning - now you need to figure out how to take the best combination of subsets that do not have a shared member between any of them. Basically you're left with a problem that resembles the classic knapsack problem but with another limitation (not every stone can be taken with every other stone). Maybe the limitation actually helps, I'm not sure.

A bit more about the advantage of dynamic programming in this case

The basic idea of dynamic programming instead of recursion is to trade redundancy of operations with occupation of memory space. By that I mean to say that recursion with a complex problem (normally a backtrack knapsack-like problem, as we have here) normally ends up calculating the same thing a fair amount of times because the different branches of calculation have no concept of each other's operations and results. Dynamic programming saves the results and uses them along the way to build "bigger" results, relying on the previous/"smaller" ones. Because the use of the stack is much more straightforward than in recursion, you don't get the memory problem you get with recursion regarding the maintenance of the function's state, but you do need to handle a great deal of memory that you store (sometimes you can optimise that).

So for example in our problem, trying to combine a subset that would add up to the required sum, the branch that starts with item A and the branch that starts with item B do not know of each other's operations. let's assume item C and item D together add up to the sum, but either of them added alone to A or B would not exceed the sum, and that A don't go with B in the solution (we can have sum=10, A=B=4, C=D=5 and there is no subset that sums up to 2 (so A and B can't be in the same group)). The branch trying to figure out A's group would (after trying and rejecting having B in its group) add C (A+C=9) and then add D, in which point would reject this group and trackback (A+C+D=14 > sum=10). The same would happen to B of course (A=B) because the branch figuring out B's group has no information regarding what just happened to the branch dealing with A. So in fact we've calculated C+D twice, and haven't even used it yet (and we're about to calculate it yet a third time to realise they belong in a group of their own).

NOTE: Looking around while writing this answer I came across a technique I was not familiar with and might be a better solution for you: memoization. Taken from wikipedia:

memoization is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again.

score 0 · Accepted Answer · answered Jul 07 '16 at 13:42

So I have a possbile solution:

    #compute difference between 2 list but keep duplicates
    def list_difference(a, b):
        count = Counter(a) # count items in a
        count.subtract(b)  # subtract items that are in b
        diff = []
        for x in a:
            if count[x] > 0:
               count[x] -= 1
               diff.append(x)
        return diff


    #return combination of numbers that match target   
    def subset_sum(numbers, target, partial=[]):
        s = sum(partial)
        # check if the partial sum is equals to target
        if s == target:
            print "--------------------------------------------sum_is(%s)=%s" % (partial, target)
            return partial
        else:
            if s >= target:
                return  # if we reach the number why bother to continue

            for i in range(len(numbers)):
                n = numbers[i]
                remaining = numbers[i+1:]
                rest = subset_sum(remaining, target, partial + [n])
                if type(rest) is list:
  #repeat until rest is > target and rest is not the same as previous
    def repeatUntil(subset, target):
       currSubset = []
       while sum(subset) > target and currSubset != subset:
        diff = subset_sum(subset, target)
        currSubset = subset
        subset = list_difference(subset, diff)
    return subset

Output:

--------------------------------------------sum_is([11.96, 8.04])=20
--------------------------------------------sum_is([1, 10, 9])=20
--------------------------------------------sum_is([7.8, 11.13, 1.07])=20
--------------------------------------------sum_is([20])=20
--------------------------------------------sum_is([9, 11])=20
[15.04]

Unfortunately this solution does work for a small list. For a big list still trying to break the list in small chunks and calculate but the answer is not quite correct. You can see it o a new thread here: Finding unique combinations of numbers to reach a given sum

Subset Sum floats Elimations

4 Answers4

A bit more about the advantage of dynamic programming in this case