1

I’m having a bit of trouble controlling the results from a data generating algorithm I am working on. Basically it takes values from a list and then lists all the different combinations to get to a specific sum. So far the code works fine(haven’t tested scaling it with many variables yet), but I need to allow for negative numbers to be include in the list.

The way I think I can solve this problem is to put a collar on the possible results as to prevent infinity results(if apples is 2 and oranges are -1 then for any sum, there will be an infinite solutions but if I say there is a limit of either then it cannot go on forever.)

So Here's super basic code that detects weights:

import math

data = [-2, 10,5,50,20,25,40]
target_sum = 100
max_percent = .8 #no value can exceed 80% of total(this is to prevent infinite solutions

for node in data:
    max_value = abs(math.floor((target_sum * max_percent)/node))
    print node, "'s max value is ", max_value

Here's the code that generates the results(first function generates a table if its possible and the second function composes the actual results. Details/pseudo code of the algo is here: Can brute force algorithms scale? ):

from collections import defaultdict

data = [-2, 10,5,50,20,25,40]
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T = defaultdict(bool)           # all values are False by default
T[0, 0] = True                # base case


for i, x in enumerate(data):    # i is index, x is data[i]
    for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
        for c in range(s / x + 1):  
            if T[s - c * x, i]:
                T[s, i+1] = True

coeff = [0]*len(data)
def RecursivelyListAllThatWork(k, sum): # Using last k variables, make sum
    # /* Base case: If we've assigned all the variables correctly, list this
    # * solution.
    # */
    if k == 0:
        # print what we have so far
        print(' + '.join("%2s*%s" % t for t in zip(coeff, data)))
        return
    x_k = data[k-1]
    # /* Recursive step: Try all coefficients, but only if they work. */
    for c in range(sum // x_k + 1):
       if T[sum - c * x_k, k - 1]:
           # mark the coefficient of x_k to be c
           coeff[k-1] = c
           RecursivelyListAllThatWork(k - 1, sum - c * x_k)
           # unmark the coefficient of x_k
           coeff[k-1] = 0

RecursivelyListAllThatWork(len(data), target_sum)

My problem is, I don't know where/how to integrate my limiting code to the main code inorder to restrict results and allow for negative numbers. When I add a negative number to the list, it displays it but does not include it in the output. I think this is due to it not being added to the table(first function) and I'm not sure how to have it added(and still keep the programs structure so I can scale it with more variables).

Thanks in advance and if anything is unclear please let me know.

edit: a bit unrelated(and if detracts from the question just ignore, but since your looking at the code already, is there a way I can utilize both cpus on my machine with this code? Right now when I run it, it only uses one cpu. I know the technical method of parallel computing in python but not sure how to logically parallelize this algo)

Community
  • 1
  • 1
Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • This appears to be a linear algebra problem. You want to know if there is a solution set `b` to the linear combination with coefficients given by the list in question. Did I get that part right? – Joel Cornett Feb 11 '12 at 20:52
  • @JoelCornett Yes you are correct, but that has been solved already(second code above) the problem is integrating some kind of collar on the possible results(block of first code) so I can use negative numbers as well(or another approach that allows for negative numbers). – Lostsoul Feb 11 '12 at 20:56
  • I don't know if this will help, but if you're trying to limit the depth of recursion, you can just pass another variable to the recursive function (I call it `depth`). Everytime you call the function recursively, pass `depth - 1`. Include an if statement to check if depth is zero, and stop recursion then. – Joel Cornett Feb 11 '12 at 21:08
  • I'm not trying to limit the depth of the recursion, I'm trying to include negative numbers and avoid that causing infinite solutions. – Lostsoul Feb 11 '12 at 21:12
  • Hmmm... If this is a linear Diophantine equation, (A.K.A. Bezout's Identity. generalized) Then wouldn't it have either zero or an infinite number of solutions? – Joel Cornett Feb 11 '12 at 21:37
  • Not exactly, it would if there is no limits on the result. For example, there is only 29 ways to use a nickel, dime and quarter to make a dollar. If you add a negatitve currency, than it becomes infinite because every value can be countered. If there's a limit of the value though then its prevented from going to infinity. Play with the python code above and change the numbers in the list to see different results – Lostsoul Feb 11 '12 at 21:43

1 Answers1

3

You can restrict results by changing both loops over c from

for c in range(s / x + 1):  

to

max_value = int(abs((target_sum * max_percent)/x))
for c in range(max_value + 1):

This will ensure that any coefficient in the final answer will be an integer in the range 0 to max_value inclusive.

A simple way of adding negative values is to change the loop over s from

for s in range(target_sum + 1):

to

R=200 # Maximum size of any partial sum
for s in range(-R,R+1):

Note that if you do it this way then your solution will have an additional constraint. The new constraint is that the absolute value of every partial weighted sum must be <=R.

(You can make R large to avoid this constraint reducing the number of solutions, but this will slow down execution.)

The complete code looks like:

from collections import defaultdict

data = [-2,10,5,50,20,25,40]

target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T = defaultdict(bool)           # all values are False by default
T[0, 0] = True                # base case

R=200 # Maximum size of any partial sum
max_percent=0.8 # Maximum weight of any term

for i, x in enumerate(data):    # i is index, x is data[i]
    for s in range(-R,R+1): #set the range of one higher than sum to include sum itself
        max_value = int(abs((target_sum * max_percent)/x))
        for c in range(max_value + 1):  
            if T[s - c * x, i]:
                T[s, i+1] = True

coeff = [0]*len(data)
def RecursivelyListAllThatWork(k, sum): # Using last k variables, make sum
    # /* Base case: If we've assigned all the variables correctly, list this
    # * solution.
    # */
    if k == 0:
        # print what we have so far
        print(' + '.join("%2s*%s" % t for t in zip(coeff, data)))
        return
    x_k = data[k-1]
    # /* Recursive step: Try all coefficients, but only if they work. */
    max_value = int(abs((target_sum * max_percent)/x_k))
    for c in range(max_value + 1):
       if T[sum - c * x_k, k - 1]:
           # mark the coefficient of x_k to be c
           coeff[k-1] = c
           RecursivelyListAllThatWork(k - 1, sum - c * x_k)
           # unmark the coefficient of x_k
           coeff[k-1] = 0

RecursivelyListAllThatWork(len(data), target_sum)
Peter de Rivaz
  • 33,126
  • 4
  • 46
  • 75
  • Thank you so much Peter! Looks good, but I'm sorry I'm bit of a dummy with algos and I don't really understand the R variable(I understand what range does but not sure of its purpose and how I can control it to avoid losing any results). Can you explain it a bit more so I can figure out how to make it more dynamic? I hate hard coding numbers, can I do something like take the largest number in the list and multiply it by the total sum to get that number? I want to keep cpu time down but also do not want to miss any possible results by modifying R incorrectly. – Lostsoul Feb 11 '12 at 21:55
  • `range` generates a list of values that is iterated over in a `for` loop. It comes with 3 arguments. range(, , ). Calling `range(-200, 200, 2)` would return a list of all the even integers from -200 to 198, for example. – Joel Cornett Feb 11 '12 at 22:04
  • I understand that, I'm more interested in the logic behind R so I can understand how to change it so no results are lost and not too high so it doesn't take forever to process – Lostsoul Feb 11 '12 at 22:47
  • Each term in your answer will be at most target_sum*max_percent, so one way of getting a safe value of R would be R=int(target_sum*max_percent*len(data)) – Peter de Rivaz Feb 12 '12 at 15:11
  • Thanks so much Peter, I have been playing around with the code but am having a bit a trouble figuring out why the results are different. For example, if I make the target_sum = 10 and the data = [2,5,8] then using my original method(in my post) I get 25 results but when I run your code(on the same numbers) I get 34 matches. Is it somehow adding more results? – Lostsoul Feb 19 '12 at 19:31
  • @Lostsoul Could you post one of the results that my code gives that the old method did not give perhaps? – Peter de Rivaz Feb 19 '12 at 20:05
  • oh sorry, I was just reviewing the table generated part(I'm soo far from learning the rest of the logic). When I print the values of T, I get more values for that table in your code than I do for the original one. I guess your results incorporate more results(but I was thinking not processing all the extra results may make the program more efficent) so if the total sum is 10, the program should(in theory) stop at 10 but yours goes up to 21. I think this is to handle the negative values which I guess is why yours works. – Lostsoul Feb 19 '12 at 20:33
  • but after playing around with it, the original has 2 entries that your table does not have. There are: (10, 1) and (10, 2) – Lostsoul Feb 19 '12 at 20:33
  • @Lostsoul Ah I see. Yes I would expect my version to have more results in the table because it is allowing for the possibility that it might need to go to a bigger number and then use negative numbers to get back to the start. It may also miss some entries due to the additional max_percent constraint. If you make max_percent >=1.0 then you should get all the values that it used to make. – Peter de Rivaz Feb 19 '12 at 20:54
  • @PeterdeRivaz Your totally right, I see now. Sorry I forgot to mark it complete, I'll do that now. Thanks so much for your help and if I run into any questions I might bug you with questions(sorry I think I'm a bit slow at picking up algos. I'm trying to confirm if was dropped on my head as a child :-) Thanks again so much Peter! – Lostsoul Feb 19 '12 at 22:03
  • @PeterdeRivaz I'm sorry to bug you again but I noticed another thing, when I change the order of the list number of positive results differ.For example, if I run data = [-2, 10,5,50,20,25,40] I get 2,618 true results but if I run data = [10, -2,5,50,20,25,40] I get 2,578 results. Sorting the file is no problem if that gives me the most complete results but not sure if thats enough. If you have time can you explain to me the difference? – Lostsoul Feb 21 '12 at 00:02
  • @Lostsoul The R variable limits the size of the sum of the first n terms of the possible outputs. This will result in a slightly different constraint when you reorder the terms. In other words, this difference is probably because you have set R too small at the moment. If you increase R you should get the same results no matter how you reorder the list. – Peter de Rivaz Feb 21 '12 at 19:03
  • Thanks Peter. When I did it, I used the R forumla that was suggested, R=int(target_sum*max_percent*len(data)) I thought that would include everything(and the percent change was 1.0). What can I do to change it? if putting it in order helps then I don't mind doing that, just wanted to be sure – Lostsoul Feb 21 '12 at 20:20
  • @Lostsoul: Hmm, when I tried the same experiment max_percent=1.0, target_sum=100, R=int(target_sum*max_percent*len(data)) then I got 4815 results both ways round. What value for R did you get? I have 700 according to the formula. – Peter de Rivaz Feb 21 '12 at 21:43