1

I have a script below that gives the closest 2 values to a given sum. It also iterates through a list of given sums and after each iteration removes the numbers that have already been used.

I need to modify this script so that it produces a necessary amount of values closest to each sum, rather than 2. The script needs to accept float values and cannot re-use values. Effectively it needs to pick the most efficient set closest to target, update the set to remove values used, then move on to next target etc..

With pairs it it doesn't work very well for specific use cases/sets that require 3 numbers or 4 etc. to actually get closest to the sum. I need this script to also be able to accept float values, which this script currently does.

Any suggestions would be much appreciated. Also if someone knows a better script for this, please let me know.

import sys
def find_closese_sum(numbers, target):
    start = 0
    end = len(numbers) - 1
    result = sys.maxint
    result_tuple = None
    while start < end:
        if numbers[start] + numbers[end] == target:
            print 0, (numbers[start], numbers[end])
            return
        elif numbers[start] + numbers[end] > target:
            if abs(numbers[start] + numbers[end] - target) < result:
                result = abs(numbers[start] + numbers[end] - target)
                result_tuple = (numbers[start], numbers[end])
            end -= 1
        else:
            if abs(numbers[start] + numbers[end] - target) < result:
                result = abs(numbers[start] + numbers[end] - target)
                result_tuple = (numbers[start], numbers[end])
            start += 1

    for i in result_tuple:
        numbers.remove(i)

    return result_tuple

if __name__ == "__main__":
    target = [14,27,39]
    numbers = [1,5,5,10,7,8,11,13,66,34]
    print numbers
    numbers = sorted(numbers)


    for i in target:
        result_shown = find_closese_sum(numbers, i)

        print result_shown
  • This is the [subset sum problem](https://en.wikipedia.org/wiki/Subset_sum_problem). It is [NP-complete](https://en.wikipedia.org/wiki/NP-completeness); this doesn't necessarily mean there's no better approach than brute force, but it probably means there are no _significantly_ better approaches. Discussion [at this other question](https://stackoverflow.com/questions/6012963/subset-sum-problem). – Nathan Vērzemnieks Feb 07 '18 at 04:11
  • @PhilippVengrinovich In your example, `numbers = [1,5,5,10,7,8,11,13,66,34]`, you have two values of 5. If for the second target number the combination contains a 5, which 5 should be removed from the `numbers` list? First one? Second one? or both? – AGN Gazer Feb 07 '18 at 08:31
  • That is, if 5 was used, what should be the updated sequence sequence: `[1,10,7,8,11,13,66,34]` or `[1,5,10,7,8,11,13,66,34]`? – AGN Gazer Feb 07 '18 at 08:40
  • @AGNGazer In answer to you question, it should be the second sequence. So if any of the combinations use a 5, this single five should be removed from the list, but the other(s) should stay. – Philipp Vengrinovich Feb 07 '18 at 08:50

4 Answers4

1

I don't see any elegant way to do this, so you probably are going to have to brute force the solution. Make all possible subsets of numbers, sum them, and check which is the closest to the target. It would look something like this.

from itertools import permutations

def find_closest_sum(numbers, target, n):
    permlist = list(permutations(numbers, n))
    sumlist = [sum(l) for l in permlist]
    maxpos = 0
    for i in range(1, len(sumlist)):
        if abs(sumlist[i] - target) < abs(sumlist[maxpos]-target):
             maxpos = i

     return permlist[maxpos]

numbers = [1,5,5,10,7,8,11,13,66,34]
result_shown = find_closest_sum(numbers, 20, 4)
print result_shown

Using permutations makes a lot of the code you wrote unnecessary.

Sam Craig
  • 875
  • 5
  • 16
1

My answer, using python 3 but you should be able to port it easily.

from itertools import combinations as c

if __name__ == "__main__":
    target = [14,27,39]
    numbers = [1,5,5,10,7,8,11,13,66,34]

    for combo in range(1,4):
        for i in target:
            #lambda to find the difference between sum and target
            diff = lambda x: abs(sum(x) - i)
            #get all unique combinations
            combos = {tuple(sorted(c)) for c in c(numbers, combo)}
            #sort them
            combos = sorted(combos, key = diff)
            #get the smallest difference
            smallest = diff(combos[0])
            #filter out combos larger than the smaller difference
            result = [c for c in combos if diff(c) == smallest]
            print('results for {}, best combinations are off by {}:'.format(i, smallest))
            print(result)

You stated you need to exclude numbers used for a previous result. To do that just remove them from the list:

            #after print(result), inside the "for i in target" loop
            #if you want to exclude all numbers used in all combinations
            numbers = [n for n in numbers if n not in [b for a in result for b in a]]
            #if you only need to remove the first match
            numbers = [n for n in numbers if n not in result[0]]
Turksarama
  • 1,136
  • 6
  • 13
  • Thank you, this is very helpful! How would you modify this script so it doesn't re-use the values that were used to generate a previous pair, unfortunately it's one of the conditions I need to follow? So essentially the output will be any closest combination to a sum, delete the numbers used, then try again with updated number set and another sum. I tried modifying your code but couldn't figure out a good solution for not re-using numbers. Thank you so much for your help on this! – Philipp Vengrinovich Feb 07 '18 at 05:56
  • Thank you so much again! Last question - when I am trying to use/modify your script, i can't seem to be able to return the most efficient sequence only. So for example for 2 target numbers, the first sequence returned will be where 'smallest' is as close to 0 as possible, same for the second target number while removing values that were already used for the most efficient first set etc. Every combination that I have tried so far returns a list index out of range error. Thanks again, I've been battling this one for a while now and really appreciate the help – Philipp Vengrinovich Feb 07 '18 at 08:21
  • I think in that case what is happening is that since you are looping over the list so many times, you are actually removing all numbers from the list so there aren't ANY combinations remaining. It is up to you to determine how you want to handle this problem. Note that my outer list is getting combos of 1, 2 and 3 numbers and I assume you don't actually want all of those at the same time, so try simply doing only one loop (with combo = 3, for example). Since you only have ten numbers you will run out quickly. – Turksarama Feb 07 '18 at 23:50
  • Thank you, this is very helpful, however when I try to run on a large data set of floats and float targets, the script doesn't work. I will try to play around with this more, thanks again! – Philipp Vengrinovich Feb 08 '18 at 04:12
0
import numpy as np
import itertools

def find_closese_sum(numbers, targets):
    numbers = numbers[:]
    for t in targets:
        if not numbers:
            break
        combs = sum([list(itertools.combinations(numbers, r))
                     for r in range(1, len(numbers)+1)], [])
        sums = np.asarray(list(map(sum, combs)))
        bestcomb = combs[np.argmin(np.abs(np.asarray(sums) - t))]
        numbers = list(set(numbers).difference(bestcomb))
        print("Target: {},  combination: {}".format(t, bestcomb))

Example

In [101]: import numpy as np
     ...: import itertools
     ...: 
     ...: def find_closese_sum(numbers, targets):
     ...:     numbers = numbers[:]
     ...:     for t in targets:
     ...:         if not numbers:
     ...:             break
     ...:         combs = sum([list(itertools.combinations(numbers, r))
     ...:                      for r in range(1, len(numbers)+1)], [])
     ...:         sums = np.asarray(list(map(sum, combs)))
     ...:         bestcomb = combs[np.argmin(np.abs(np.asarray(sums) - t))]
     ...:         numbers = list(set(numbers).difference(bestcomb))
     ...:         print("Target: {},  combination: {}".format(t, bestcomb))
     ...: 
     ...: targets = [14,27.,39]
     ...: numbers = [1.0,5,5,10,7,8,11,13,66,34]
     ...: find_closese_sum(numbers, targets)
     ...: 
Target: 14,  combination: (1.0, 13)
Target: 27.0,  combination: (5, 5, 10, 7)
Target: 39,  combination: (8, 34)
AGN Gazer
  • 8,025
  • 2
  • 27
  • 45
  • Thank you so much for the quick response. I try to run your script but cannot manage to execute successfully. Can you please let me know what values you pass where in order to run? Thank you in advance! – Philipp Vengrinovich Feb 07 '18 at 05:54
  • @PhilippVengrinovich I have fixed two typos in my code (sorry) and added a real workable example. – AGN Gazer Feb 07 '18 at 06:09
  • thank you! I have noticed however that your script doesn't work with float values and also re-uses values. Unfortunately these are some of my conditions for this script. Do you think it is possible to modify your script in a way that would allow for exactly same output but using floats and also not re-using values? Thank you so much again for your help! – Philipp Vengrinovich Feb 07 '18 at 07:56
  • Thank you, works great! I also tried to run your script using the following values: targets = [778592.81, 87702.82, 39757.92, 24590.12, 12534.6, 28441.06] (see numbers in the next comment) but all the script gave me was 'killed: 9' – Philipp Vengrinovich Feb 08 '18 at 03:55
  • numbers = [3317.64, 2424.72, 13535.4, 3317.64, 4859.04, 20701.2, 5522.72, 8098.4, 11146.8, 1546.632, 23796.32, 2981.36, 1546.632, 17041.28, 34080.56, 4247.68, 27215.76, 13845.72, 38686.08, 51123.84, 153371.52, 102241.68, – Philipp Vengrinovich Feb 08 '18 at 03:57
  • 14866.88, 17615.312, 27757.492, 8520.64, 11570.4, 2123.84, 22267.44, 29669.4, 70118.52, 170402.8, 12743.04, 8547.92, 7447.132, 14866.88, 4273.96, 12370.8, 29669.4, 9671.52, 102247.68, 136322.24, 31857.6, 8547.92, 47425.84, 13540.24, 24590.12, 12282.92, 33775.28, 3868.208, 5805.312, 4948.32, 5933.88, 7253.64, 2708.048, 676.384, 3955.92, 2417.88, 4247.68, 677.012, 12370.8, 3955.92, 2417.88, 8520.64, 17041.28, 4273.96, 3070.48, 24565.84, 2264.04, 2290.32, 141297.68, 66930.48, 37183.6, 1484.44, 95861.12, 23938, 119826.4, 35947.92, 65904.52, 11982.64] – Philipp Vengrinovich Feb 08 '18 at 03:57
  • sorry for such a long comment thread :) I know that with a number set like this the script should run for a very long time, but It just ends up breaking in 2 minutes. Is this because the number set is so long? @AGN GAZER – Philipp Vengrinovich Feb 08 '18 at 03:58
  • Yes, you are running out of memory. For 80 numbers, total number of possible combinations of all possible lengths is about 10**24(!) Good Luck! – AGN Gazer Feb 08 '18 at 04:12
  • do you think running this on an AWS server would help? – Philipp Vengrinovich Feb 08 '18 at 16:22
  • @PhilippVengrinovich I doubt. You can do some math and compute the required memory. – AGN Gazer Feb 08 '18 at 16:29
  • your solution definitely answered the question, I tried to upvote it but since i'm a new user, my votes don't publicly count. let me know if there is anything else I can do. Thanks again! – Philipp Vengrinovich Feb 08 '18 at 21:05
-1

Without making it complex here is my approach, Try this :

import itertools
def finding_closet(ls,target,depth):
    closest = []
    for i in itertools.combinations(ls, depth):

        if sum(i) == target:
            return i
        else:
            closest.append((abs(sum(i) - target), i))
    return min(closest)[1]

Test_case 0:

print(finding_closet([1,5,5,10,7,8,11,13,66,34],20,2))

output:

(7, 13)

Test_case 1:

print(finding_closet([1,5,5,10,7,8,11,13,66,34],20,4))

output:

(1, 5, 5, 8)

test_case_3:

print(finding_closet([1,5,5,10,7,8,11,13,66,34],20,8))

output:

(1, 5, 5, 10, 7, 8, 11, 13)
Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88