Algorithm: Extract subset based on property sum

Question

I want an algorithm (no specific language) to find a subset from a set of integers such that their sum is within a certain range.

For example, if I have a group of people, whose weights are as follows.

var people:{
   jane:126,
   julia:112,
   charles:98,
   john:182,
   bob:213,
   edgar: 237,
   jay: 223,
   dan: 191,
   alex: 210,
   david: 196
}

Now, from these people, I'd like to find a subset whose combined weight is between 818-822 pounds (If you're trying to do the math... don't bother, these numbers are out of my head, and I don't even know if there's a solution with this dataset). The number of people in the group doesn't matter, just a group from the larger set. And really, any group will do (although random is better in my case).

Note that this is just a quick example... there would actually be hundreds of people, and it would be possible that there would be no combination that would fit into this criteria. Because the actual numbers would be much larger than this, I'm concerned about a n^n problem and running through thousands of iterations, even though I need this to run very quickly.

Maybe I fell asleep during that day in computer science class, but I haven't been able to come up with anything other than brute force methods.

I've tagged this as javascript, simply because that is closest to my actual implementation (and it reads easier). Open to other solutions, as long as they aren't predicated on some Cthulhu function somewhere.

I know this is a weird question to ask on SO, but any help here would be appreciated.

Ok, I'm stumped. 23 hours to post a bounty for something that I can grok code-wise -- my background is certainly not in this realm, and I have a hard time even discerning the notations used to describe the problem, let alone the solutions.

Anybody want to help me out and throw me some sample javascript code that I can modify to the final project? I'll be adding a 250pt bounty when I can... but if a decent solution comes through I'll hand it out when the time comes.

This reminds me of a 1D Bin Packing problem. Have you considered something like that? — im so confused, Oct 09 '12 at 20:30

Evgeny Kluev · Accepted Answer · 2012-10-12T12:42:19.650

10

This is similar to 0-1 Knapsack problem or Subset sum problem.

If weights are not very large integer numbers, a dynamic programming solution should be efficient.

Here is javascript implementation of dynamic programming algorithm. If you want random groups, just random shuffle the list of people before applying this algorithm.

var p = {
   jane:126,
   julia:112,
...
};

function subset(people, min, max)
{
  var subsets = [];
  subsets[0] = '';

  for (var person in people)
  {
    for (var s = min-1; s >= 0; --s)
    {
      if (s in subsets)
      {
        var sum = s + people[person];

        if (!(sum in subsets))
        {
          subsets[sum] = subsets[s] + ' ' + person;

          if (sum >= min && sum <= max)
          {
            return subsets[sum];
          }
        }
      }
    }
  }

  return 'Not found';
}

print(subset(p, 818, 822));

edited Oct 12 '12 at 12:42

answered Oct 09 '12 at 20:32

Evgeny Kluev

24,287
7
55
98

1

If we iterate over different size sets (starting by looking for a 1-element set) then at each stage subtracting the average weight for the required total (e.g. at 3 elements, subtract 812/3) from all the values we get the subset sum problem (total sum should be zero). – Phil H Oct 09 '12 at 20:42
+1 for helping me understand the nature of the problem better. Now I just have to figure out how to write it. – John Green Oct 09 '12 at 21:07
To be frank, I'd never have come up with that on my own (especially since my data was in doubles and I didn't start with a hash). It looks so simple, but it hides its complexity in its elegance. + 250. – John Green Oct 12 '12 at 06:03

LSerni · Answer 2 · 2012-10-09T20:51:25.713

It looks like a variation of the "binary knapsack problem", with the added bound that if the best fit is still outside the acceptable range, the answer is rejected.

You may want to look into "polynomial time approximations".

One approach could be to sort the set by weight. Then you start looking from the middle downwards and upwards: you get dan, john, david, alex and you're at 779. You add jane and find yourself at 905 and that's too much by 87; so you check the name below, julia, that's 112, and look the closest match between differences. Exchanging Alex with Julia (210-112) would lose you 98, exchanging David with Julia will lose 84. Lather, rinse, repeat.

The algorithm is O(n log n) for sorting, and then depends on set size. It has several drawbacks (not guaranteed to converge, groups will tend to be contiguous, it will gather people near the starting point, etc.), but if you just want "a group", it might be enough.

You can also implement the algorithm recursively. The very worst case would be O(n^3 log n), but if you're really working with human beings (weights clustered in reasonably small ranges, smooth curve), convergence is likely to be pretty quick.

+1 for helping me understand the nature of the problem better. Now I just have to figure out how to write it. — John Green, Oct 09 '12 at 21:08

Tyler Durden · Answer 3 · 2012-10-09T21:57:01.483

This is what I call a "sort and bracket" problem. The way you solve it is by sorting the data and then bracketing around the target value or target range.

For example, in this case the sorted order is:

98
112
126
182
191
196
210
213
223
237

Now you average the list: 178.8. Therefore the starting bracket is (126,182). Start moving out from this average: sum(126,182,112,191,98) = 709, too small. Delete the 98 and replace with value from the other side: 196, ie sum(126,182,112,191,196) = 807, still too small. Go to next value on high side, sum(126,182,112,191,210)=821. Ok, found one match. By continuing this process you can find every match. Basically what bracketing does is help you search only a subset of all the possible combinations so you do not have to check every combination. You are generating combinations outward from an average, instead of from one end or the other.

Whenever your sum exceeds/falls below the range you terminate the combination generation on the high/low side and switch to the other. This is the optimal solution to the problem.

Implementation Method: to implement this algorithm you need to get a combination generator that works in "lexicographical" order. You then start with n, say 5, items and determine the median combination as I have shown above. You then get the next lower combination, if you are low, you switch to the next higher combination and so on.

-------------- ADDENDUM -------------------

After thinking about this it might be better to use a plain changes-type algorithm for this rather than a lexicographical combinator. This type of algorithm will generate all combinations, but only switch any 2 elements at a given time. Basically you modify this algorithm to switch direction whenever it goes out of bounds (goes above the range or below it).

This is certainly a different (and more programmer-y) way to attack the problem. After reading through 'standard' implementations of of the subset-sum problem, your solution struck me as insufficient at first, but the more I think about how it needs to work, the more I like it. It has great speed potential, as it will weight itself very well middle of the curve to the specifics of my data set (especially if I'm able to guess the # of items correctly from the outset). Biggest drawback is that the code required to implement this may be... obtuse to anyone other than the implementor. : ) — John Green, Oct 09 '12 at 21:34
This isn't the optimal solution to the problem. It works well for a particular subset of the possible inputs but has bad performance for others. — argentage, Oct 10 '12 at 00:27

score 1 · Answer 4 · edited May 23 '17 at 12:05

Here is answer to a similar question Find combination(s) sum of element(s) in array whose sum equal to a given number

<?php

$limit = 12;
$array = array(6,1,10,4,1,3,11,2,15,5,12,10,17);

echo implode(', ',$array).'<br>';

// remove items 15 and 17 because they are great then $limit = 12
$array = array_filter($array, function($var) use ($limit) {
  return ($var <= $limit);
});

rsort($array);
echo implode(', ',$array);

// the algorithm is usable if the number of elements is less than 20 (because set_time_limit)
$num = count($array); 

//The total number of possible combinations 
$total = pow(2, $num);

$out = array();

// algorithm from http://r.je/php-find-every-combination.html
// loop through each possible combination  
for ($i = 0; $i < $total; $i++) {  

    $comb = array();

    // for each combination check if each bit is set 
    for ($j = 0; $j < $num; $j++) { 
       // is bit $j set in $i? 
        if (pow(2, $j) & $i){
          $comb[] = $array[$j];
        }      
    } 

    if (array_sum($comb) == $limit)
    {
      $out[] = $comb;
    }
}

array_multisort(array_map('count', $out), SORT_ASC, $out);

$out = array_unique($out, SORT_REGULAR);

foreach($out as $result) echo implode(', ', $result).'<br>';

The output of this code is list of combinations whose sum is just $limit(12)...

12
10, 2
11, 1
5, 4, 3
6, 4, 2
6, 5, 1
10, 1, 1
5, 4, 2, 1
6, 3, 2, 1
6, 4, 1, 1
5, 3, 2, 1, 1

Algorithm: Extract subset based on property sum

4 Answers4

Linked

Related