0

The following problem is often called by several names, and has plenty of literature available. Unfortunately, I'm a little new to Python, and could use a little help applying the solution to my case.

I have a pandas dataframe containing ~40,000 rows, so optimization is probably a factor. The dataframe contains several columns of object codes, and a resulting column of dollar amounts. I would like to prove that a particular subset of these dollar amounts total a given value. In other words, I would like to prove the following:

IN: 

Target: $11.72

Code1    Code2   Code3    Amount
RG22     331     ZAV      $2.00     
XG11     542     TAM      $4.23
RG22     117     GEE      $6.81
RG76     956     ZXA      $2.91
ZZ99     223     TTQ      $11.99
BW32     454     PBC      $9.35
OUT:

Code1    Code2   Code3    Amount
RG22     331     ZAV      $2.00   
RG22     117     GEE      $6.81
RG76     956     ZXA      $2.91

Most solutions (including this great solution, code below) only accept and return lists of values. I need a solution which would reproduce the object codes as well. Please advise, and thank you!

def subset_sum(numbers, target, partial=[]):
    s = sum(partial)

    # check if the partial sum is equals to target
    if s == target: 
        print "sum(%s)=%s" % (partial, target)
    if s >= target:
        return  # if we reach the number why bother to continue

    for i in range(len(numbers)):
        n = numbers[i]
        remaining = numbers[i+1:]
        subset_sum(remaining, target, partial + [n]) 


if __name__ == "__main__":
    subset_sum([3,9,8,4,5,7,10],15)

    #Outputs:
    #sum([3, 8, 4])=15
    #sum([3, 5, 7])=15
    #sum([8, 7])=15
    #sum([5, 10])=15
Dylan Moore
  • 149
  • 9

1 Answers1

0

When you have your amounts (that add up to 11.72) as a list, for example obtained as a result of:

def subset_sum(numbers, target, partial=[]):
    s = sum(partial)
    if s == target: 
        return partial
    if s > target:
        return None # if we reach the number why bother to continue
    for i in range(len(numbers)):
        n = numbers[i]
        remaining = numbers[i+1:]
        result = subset_sum(remaining, target, partial + [n]) 
        if result:
            return result

amounts = subset_sum(df.Amount.tolist(), 11.72)

you can easily filter the rows containing those amounts:

print(df[df.Amount.isin(amounts)])
Błotosmętek
  • 12,717
  • 19
  • 29
  • This is helpful and might prove to be my solution. I am getting a type error when I run these lines. I think it's because of line 6: print "sum(%s)=%s" % (partial, target). I tried to simply replace this with "return partial" but the same error is produced. Any further advice would be much appreciated! – Dylan Moore May 01 '20 at 14:29
  • I'd guess you're using Python 3.x, while `subset_sum` apparently was written for 2.x. Might require some effort to make it work under 3.x – Błotosmętek May 01 '20 at 14:45
  • Good advice. Thank you! – Dylan Moore May 01 '20 at 19:01
  • Actually I was wrong, it required almost no effort - see updated answer. – Błotosmętek May 01 '20 at 19:20
  • This does the trick perfectly, thanks so much for coming back and working it out for me! – Dylan Moore May 03 '20 at 18:55