8

The subset sum problem is well-known for being NP-complete, but there are various tricks to solve versions of the problem somewhat quickly.

The usual dynamic programming algorithm requires space that grows with the target sum. My question is: can we reduce this space requirement?

I am trying to solve a subset sum problem with a modest number of elements but a very large target sum. The number of elements is too large for the exponential time algorithm (and shortcut method) and the target sum is too large for the usual dynamic programming method.

Consider this toy problem that illustrates the issue. Given the set A = [2, 3, 6, 8] find the number of subsets that sum to target = 11 . Enumerating all subsets we see the answer is 2: (3, 8) and (2, 3, 6).

The dynamic programming solution gives the same result, of course - ways[11] returns 2:

def subset_sum(A, target):
    ways = [0] * (target + 1)
    ways[0] = 1
    ways_next = ways[:]
    for x in A:
        for j in range(x, target + 1):
            ways_next[j] += ways[j - x]
        ways = ways_next[:]

    return ways[target]

Now consider targeting the sum target = 1100 the set A = [200, 300, 600, 800]. Clearly there are still 2 solutions: (300, 800) and (200, 300, 600). However, the ways array has grown by a factor of 100.

Is it possible to skip over certain weights when filling out the dynamic programming storage array? For my example problem I could compute the greatest common denominator of the input set and then reduce all items by that constant, but this won't work for my real application.

This SO question is related, but those answers don't use the approach I have in mind. The second comment by Akshay on this page says:

...in the cases where n is very small (eg. 6) and sum is very large (eg. 1 million) then the space complexity will be too large. To avoid large space complexity n HASHTABLES can be used.

This seems closer to what I'm looking for, but I can't seem to actually implement the idea. Is this really possible?

Edited to add: A smaller example of a problem to solve. There is 1 solution.

target = 5213096522073683233230240000
A = [2316931787588303659213440000,
     1303274130518420808307560000,
     834095443531789317316838400,
     579232946897075914803360000,
     425558899761116998631040000,
     325818532629605202076890000,
     257436865287589295468160000,
     208523860882947329329209600,
     172333769324749858949760000,
     144808236724268978700840000,
     123386899930738064691840000,
     106389724940279249657760000,
     92677271503532146368537600,
     81454633157401300519222500,
     72153585080604612224640000,
     64359216321897323867040000,
     57762842349846905631360000,
     52130965220736832332302400,
     47284322195679666514560000,
     43083442331187464737440000,
     39418499221729173786240000,
     36202059181067244675210000,
     33363817741271572692673536,
     30846724982684516172960000,
     28604096143065477274240000,
     26597431235069812414440000,
     24794751591313594450560000,
     23169317875883036592134400,
     21698632766175580575360000,
     20363658289350325129805625,
     19148196591638873216640000,
     18038396270151153056160000,
     17022355990444679945241600]

A real problem is:

target = 262988806539946324131984661067039976436265064677212251086885351040000
A = [116883914017753921836437627140906656193895584300983222705282378240000,
     65747201634986581032996165266759994109066266169303062771721337760000,
     42078209046391411861117545770726396229802410348353960173901656166400,
     29220978504438480459109406785226664048473896075245805676320594560000,
     21468474003260924418937523352411426647858372626711204170357987840000,
     16436800408746645258249041316689998527266566542325765692930334440000,
     12987101557528213537381958571211850688210620477887024745031375360000,
     10519552261597852965279386442681599057450602587088490043475414041600,
     8693844844295746252297013588993057072273225278585528961549928960000,
     7305244626109620114777351696306666012118474018811451419080148640000,
     6224587137040149683597270084426981690799173128454727836375984640000,
     5367118500815231104734380838102856661964593156677801042589496960000,
     4675356560710156873457505085636266247755823372039328908211295129600,
     4109200102186661314562260329172499631816641635581441423232583610000,
     3639983481521748430892521260443459881470796742937193786669693440000,
     3246775389382053384345489642802962672052655119471756186257843840000,
     2914003396564502206448583502127866774917064428556368433095682560000,
     2629888065399463241319846610670399764362650646772122510868853510400,
     2385386000362324935437502594712380738650930291856800463373109760000,
     2173461211073936563074253397248264268068306319646382240387482240000,
     1988573206351200938616141104476672789688204647842814753019927040000,
     1826311156527405028694337924076666503029618504702862854770037160000,
     1683128361855656474444701830829055849192096413934158406956066246656,
     1556146784260037420899317521106745422699793282113681959093996160000,
     1443011284169801504153550952356872298690068941987447193892375040000,
     1341779625203807776183595209525714165491148289169450260647374240000,
     1250838556670374906691960338012080744048823137584838292922165760000,
     1168839140177539218364376271409066561938955843009832227052823782400,
     1094646437211014876720019400903392201607763016346356924399106560000,
     1027300025546665328640565082293124907954160408895360355808145902500,
     965982760477305139144112620999228563585913919842836551283325440000,
     909995870380437107723130315110864970367699185734298446667423360000,
     858738960130436976757500934096457065914334905068448166814319513600,
     811693847345513346086372410700740668013163779867939046564460960000,
     768411414287644482489363509326632509674989232073666182868912640000,
     728500849141125551612145875531966693729266107139092108273920640000,
     691620793004461075955252231602997965644352569828303092930664960000,
     657472016349865810329961652667599941090662661693030627717213377600,
     625791330255672395317036671188673352614551016483550865168079360000,
     596346500090581233859375648678095184662732572964200115843277440000,
     568931977371436071675467087219123799753953628290345594563299840000,
     543365302768484140768563349312066067017076579911595560096870560000,
     519484062301128541495278342848474027528424819115480989801255014400,
     497143301587800234654035276119168197422051161960703688254981760000,
     476213321032044045508347054897310957784092466595223632570186240000,
     456577789131851257173584481019166625757404626175715713692509290000,
     438132122515529069774235170457376054037925971973698044293020160000,
     420782090463914118611175457707263962298024103483539601739016561664,
     404442609057972047876946806715939986830088526993021531852188160000,
     389036696065009355224829380276686355674948320528420489773499040000,
     374494562534633427030238036407319297168052779889230688624970240000,
     360752821042450376038387738089218074672517235496861798473093760000,
     347753793771829850091880543559722282890929011143421158461997158400,
     335444906300951944045898802381428541372787072292362565161843560000,
     323778155173833578494287055791985197213007158728485381455075840000,
     312709639167593726672990084503020186012205784396209573230541440000,
     302199145693704480473409550206308504954053507241841138853071360000,
     292209785044384804591094067852266640484738960752458056763205945600,
     282707666261699891568916593460940582033071824431295083135592960000,
     273661609302753719180004850225848050401940754086589231099776640000,
     265042888929147215048611399412486748738992254650755607041456640000,
     256825006386666332160141270573281226988540102223840088952036475625,
     248983485481605987343890803377079267631966925138189113455039385600,
     241495690119326284786028155249807140896478479960709137820831360000,
     234340660761814501342824380545368657996226388663143017230461440000,
     227498967595109276930782578777716242591924796433574611666855840000,
     220952578483466770957349011608519198854244960871423861446658560000,
     214684740032609244189375233524114266478583726267112041703579878400,
     208679870295533683104133831435857945991878646837700655494453760000,
     202923461836378336521593102675185167003290944966984761641115240000,
     197401994025105141026072179446079922264038329650750423033879040000,
     192102853571911120622340877331658127418747308018416545717228160000,
     187014262428406274938300203425450649910232934881573156328451805184,
     182125212285281387903036468882991673432316526784773027068480160000,
     177425404985627474536673746714144021883127046501745489011223040000,
     172905198251115268988813057900749491411088142457075773232666240000,
     168555556186474170249629649778586749838977769381324948621621760000,
     164368004087466452582490413166899985272665665423257656929303344400]
Community
  • 1
  • 1
bbayles
  • 4,389
  • 1
  • 26
  • 34
  • In the general case, it's not possible (or at least, there's no known way to do it). Otherwise it wouldn't be NP-Complete. For your specific cases there might be a more efficient way to do things, but without more information about your dataset, it's impossible to say. – Antimony Aug 25 '13 at 19:27
  • It is, but usually when you can't afford the space, it's unlikely you can afford the time (time is `O(numItems * targetWeight)`, memory is just `O(targetWeight)`). Are you sure you can afford the time? – IVlad Aug 25 '13 at 19:31
  • Have you seen [this post](http://stackoverflow.com/questions/9809436/fast-solution-to-subset-sum)? – wflynny Aug 25 '13 at 19:34
  • I have added an example number set and target sum. I'm not sure if I can afford the time. If the comment on the page I referred to is correct I can afford "n hashtables", though. I think. – bbayles Aug 25 '13 at 19:36
  • The problem you posted doesn't seem that bad. `n=33` is doable on a modern computer even with just brute force. However, I think you could optimize it a lot by solving the problem modulo smaller numbers first, since most of the values share a common factor. – Antimony Aug 25 '13 at 19:39
  • Yes, this is an example where I know the answer. The real problem has more elements. – bbayles Aug 25 '13 at 19:40
  • Could you post the real problem then? Or is it too big to post here? – Antimony Aug 25 '13 at 19:41
  • I've added a real one. I have reason to believe there a dozen or so solutions. I may be using the wrong approach for this. It seemed like there might be a way to skip over enumerating some sums when filling out the DP array (as in the `200, 300, 600, 800` example, you could skip by 100s), but Antimony's answer suggests not. – bbayles Aug 25 '13 at 19:46
  • There is no way you can afford the time in `O(numItems * targetWeight)` with those weights. `n / 2` is about ~30 in your case, the `O(2^(n / 2))` algorithm will be much faster. – IVlad Aug 25 '13 at 19:56
  • 1
    Where the heck did those instances come from? They have only small factors -- are you actually trying to solve another problem? – David Eisenstat Aug 25 '13 at 20:00
  • I can solve your example problem in less than half a second on my laptop, but the real one is much harder. Even after reducing it mod small numbers, I'm still left with `n=58` and an 80 bit target. – Antimony Aug 25 '13 at 20:10
  • This problem comes from Project Euler, re-stated to fit the approach I was trying to take for the solution. I work on those problems to try to learn algorithm design, and it seemed like there might be a way to apply the subset-sum DP algorithm (again, like my `200, 300, 600, 800` example). Looks like I need a different approach! – bbayles Aug 25 '13 at 20:16
  • @Bo Do you have a link handy? I'm not sure the larger problem is even solveable. – Antimony Aug 25 '13 at 20:20
  • If you're curious, it's [Problem 152](http://projecteuler.net/problem=152). Don't tell me the answer, though! I'll accept your "you are stuck with the 'hard' general case of the problem" answer below. – bbayles Aug 25 '13 at 20:22
  • @Bo Where did your numbers come from? When I take inverse squares and multiply through, I get much larger numbers, but the problem reduces slightly better (n=52 instead of n=57). – Antimony Aug 25 '13 at 20:29
  • I did `mult = reduce(lcm, [n * n for n in range(3, 80 + 1)], 1)` and `A = [mult // (n * n) for n in range(3, 80 + 1)]`. The target is them `mult // 4`. Why 3 to 80 instead of 2 to 80? Because without 2 you can't get to the solution. So it's assumed. – bbayles Aug 25 '13 at 21:24
  • Oh, I think I missed the lcm step. At any rate, it's ironic that my unreduced version would lead to a faster solution. – Antimony Aug 25 '13 at 21:33

1 Answers1

6

In the particular comment you linked to, the suggestion is to use a hashtable to only store values which actually arise as a sum of some subset. In the worst case, this is exponential in the number of elements, so it is basically equivalent to the brute force approach you already mentioned and ruled out.

In general, there are two parameters to the problem - the number of elements in the set and the size of the target sum. Naive brute force is exponential in the first, while the standard dynamic programming solution is exponential in the second. This works well when one of the parameters is small, but you already indicated that both parameters are too big for an exponential solution. Therefore, you are stuck with the "hard" general case of the problem.

Most NP-Complete problems have some underlying graph whether implicit or explicit. Using graph partitioning and DP, it can be solved exponential in the treewidth of the graph but only polynomial in the size of the graph with treewidth held constant. Of course, without access to your data, it is impossible to say what the underlying graph might look like or whether it is in one of the classes of graphs that have bounded treewidths and hence can be solved efficiently.

Edit: I just wrote the following code to show what I meant by reducing it mod small numbers. The following code solves your first problem in less than a second, but it doesn't work on the larger problem (though it does reduce it to n=57, log(t)=68).

target = 5213096522073683233230240000
A = [2316931787588303659213440000,
     1303274130518420808307560000,
     834095443531789317316838400,
     579232946897075914803360000,
     425558899761116998631040000,
     325818532629605202076890000,
     257436865287589295468160000,
     208523860882947329329209600,
     172333769324749858949760000,
     144808236724268978700840000,
     123386899930738064691840000,
     106389724940279249657760000,
     92677271503532146368537600,
     81454633157401300519222500,
     72153585080604612224640000,
     64359216321897323867040000,
     57762842349846905631360000,
     52130965220736832332302400,
     47284322195679666514560000,
     43083442331187464737440000,
     39418499221729173786240000,
     36202059181067244675210000,
     33363817741271572692673536,
     30846724982684516172960000,
     28604096143065477274240000,
     26597431235069812414440000,
     24794751591313594450560000,
     23169317875883036592134400,
     21698632766175580575360000,
     20363658289350325129805625,
     19148196591638873216640000,
     18038396270151153056160000,
     17022355990444679945241600]

import itertools, time
from fractions import gcd

def gcd_r(seq):
     return reduce(gcd, seq)

def miniSolve(t, vals):
     vals = [x for x in vals if x and x <= t]
     for k in range(len(vals)):
          for sub in itertools.combinations(vals, k):
               if sum(sub) == t:
                    return sub
     return None

def tryMod(n, state, answer):
     t, vals, mult = state
     mods = [x%n for x in vals if x%n]
     if (t%n or mods) and sum(mods) < n:
          print 'Filtering with', n
          print t.bit_length(), len(vals)
     else:
          return state

     newvals = list(vals)
     tmod = t%n
     if not tmod:
          for x in vals:
               if x%n:
                    newvals.remove(x)
     else:
          if len(set(mods)) != len(mods):
               #don't want to deal with the complexity of multisets for now
               print 'skipping', n
          else:
               mini = miniSolve(tmod, mods)
               if mini is None:
                    return None
               mini = set(mini)
               for x in vals:
                    mod = x%n
                    if mod:
                         if mod in mini:
                              t -= x
                              answer.add(x*mult)
                         newvals.remove(x)
     g = gcd_r(newvals + [t])
     t = t//g
     newvals = [x//g for x in newvals]
     mult *= g
     return (t, newvals, mult)

def solve(t, vals):
     answer = set()
     mult = 1
     for d in itertools.count(2):
          if not t:
               return answer
          elif not vals or t < min(vals):
               return None #no solution'
          res = tryMod(d, (t, vals, mult), answer)
          if res is None:
               return None
          t, vals, mult = res
          if len(vals) < 23:
               break

          if (d % 10000) == 0:
               print 'd', d

     #don't want to deal with the complexity of multisets for now
     assert(len(set(vals)) == len(vals))
     rest = miniSolve(t, vals)
     if rest is None:
          return None
     answer.update(x*mult for x in rest)
     return answer

start_t = time.time()
answer = solve(target, A)
assert(answer <= set(A) and sum(answer) == target)
print answer
Antimony
  • 37,781
  • 10
  • 100
  • 107
  • 1
    Have you tried solving it with the `O(2^(n / 2))` algorithm? That should be be fast enough for `n = 57`. – IVlad Aug 25 '13 at 20:32
  • 1
    @IVlad It exhausted all available memory and froze my system to the point where I had to do a hard shutdown. – Antimony Aug 25 '13 at 21:30