0

I have a column with a long list of numbers(40k), and I have a specific result (100,000).

Is there a way (in Solver, python, R, etc) that give me the solution that summing some selected numbers, give me the expected result (100,000) or the most aproximate.

Short example:

Example

  • 1
    Are you asking for a program that will determine the combination with a sum closest to 100,000? – Aarav Dave Dec 22 '22 at 01:46
  • 1
    This may be useful: https://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum – Edward Dec 22 '22 at 04:03

2 Answers2

4

This can be solved as a MIP (Mixed-Integer Programming) problem:

 min |deviation| 
 subject to sum(i, value[i]*x[i]) = target + deviation
 x[i] ∈ {0,1}

Some code to test this:

import cvxpy as cp
import random

#
# problem size
#
N = 40000
I = range(N)

#
# problem data
#
target = 100000
random.seed(12345)
p = [random.randrange(0,10000) for i in I]

#
# model
#
x = cp.Variable(N,boolean=True)
deviation = cp.Variable(1)
prob = cp.Problem(cp.Minimize(cp.abs(deviation)),[p@x==target+deviation])

#
# solve and reporting
#
prob.solve(solver=cp.CBC,verbose=True)

xsum = 0
for i in I:
    if x[i].value > 0.5:
        print(f"i:{i} value:{p[i]}")
        xsum += p[i]
print(f"sum={xsum}")


 

Results:

-------------------------------------------------------------------------------
                                    Summary
-------------------------------------------------------------------------------
(CVXPY) Dec 23 12:35:28 PM: Problem status: optimal
(CVXPY) Dec 23 12:35:28 PM: Optimal value: 1.438e-11
(CVXPY) Dec 23 12:35:28 PM: Compilation took 9.899e-02 seconds
(CVXPY) Dec 23 12:35:28 PM: Solver (including time spent in interface) took 1.156e+00 seconds
i:6 value:9273
i:9 value:6112
i:11 value:7093
i:13 value:9209
i:15 value:9063
i:3857 value:9209
i:9341 value:2035
i:12413 value:9209
i:15914 value:2035
i:15989 value:2126
i:16558 value:2035
i:16965 value:2035
i:24271 value:63
i:24290 value:7093
i:26791 value:624
i:30330 value:6752
i:32612 value:6752
i:34308 value:219
i:36798 value:9063
sum=100000

In this example, I used 40k integer values between 0 and 10k. I assume values are integers from the example data in the question. The MIP solver CBC needed about a second to find a combination that forms exactly 100k.

Erwin Kalvelagen
  • 15,677
  • 2
  • 14
  • 39
0

I did it manually, because the amount of numbers was to big, but the answer for a small amount of numbers in Python 3 is this:

mylist = r"C:path\of\the\file.csv"

import pandas as pd

df = pd.read_csv(mylist)


def subset_sum(numbers, target, partial=[]):
    s = sum(partial)

    # check if the partial sum is equals to target
    if s == target: 
        print ("sum(%s)=%s" % (partial, target))
    
    for i in range(len(numbers)):
        n = numbers[i]
        remaining = numbers[i+1:]
        subset_sum(remaining, target, partial + [n]) 
   

if _name_ == "_main_":
    subset_sum(#List,#target)