The below is written in python
, but I think the concept conveys very easily and can be reformulated in r
if desired.
Basically: Reformulate your problem. Instead of optimizing a long vector of binary "selection" variables, all you need is 3 variables to formulate this, specifically the (integer) number of 1's, 2's, and 3's to pick.
This solves almost instantaneously as an IP.
import pyomo.environ as pyo
from random import randint
n = 1000
k = 500
sample = [randint(1, 3) for t in range(n)]
avail = {t : len([val for val in sample if val==t]) for t in range(1, 4)}
target = 86/47
m = pyo.ConcreteModel()
m.vals = pyo.Set(initialize=[1,2,3])
m.pick = pyo.Var(m.vals, domain=pyo.NonNegativeIntegers)
m.delta = pyo.Var()
m.obj = pyo.Objective(expr=m.delta)
# constrain the delta as an absolute value of |sum(picks) - target|
m.C1 = pyo.Constraint(expr=m.delta >= sum(m.pick[v]*v for v in m.vals)-target*k)
m.C2 = pyo.Constraint(expr=m.delta >= -sum(m.pick[v]*v for v in m.vals)+target*k)
# don't use more than available for each value
def limit(m, v):
return m.pick[v] <= avail[v]
m.C3 = pyo.Constraint(m.vals, rule=limit)
soln = pyo.SolverFactory('glpk').solve(m)
print(soln)
m.pick.display()
Yields:
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 885
Number of created subproblems: 885
Error rc: 0
Time: 0.3580749034881592
Solution:
- number of solutions: 0
number of solutions displayed: 0
pick : Size=3, Index=vals
Key : Lower : Value : Upper : Fixed : Stale : Domain
1 : 0 : 3.0 : None : False : False : NonNegativeIntegers
2 : 0 : 0.0 : None : False : False : NonNegativeIntegers
3 : 0 : 304.0 : None : False : False : NonNegativeIntegers
Realize you can also attack this algorithmically quite efficiently and get a (pretty easy) near-optimal answer, or with some sweat-equity get the optimal answer as well. Below is a framework that I tinkered with. The key observation is that you can "add more 3's" to the solution up until the point where the amount to go (to get to k * target
can be filled all with 1's. That's very close to as-good-as-it-gets, except for cases where you'd be better off substituting a couple of 2's near the end, I think, or backing up if you run out of 1's.
The below runs (in python
) and is most of the way there for a good approximation.
### Code:
# average hitting
from random import randint
n = 1000
k = 50
sample = [randint(1, 3) for t in range(n)]
available = {t : len([val for val in sample if val==t]) for t in range(1, 4)}
target = 86/47
print(f'available at start: {available}')
sum_target = target * k
soln = []
selections_remaining = k
togo = sum_target - sum(soln)
for pick in range(k):
if togo > k - pick and available[3] > 0:
soln.append(3)
available[3] -= 1
elif togo > k - pick and available[2] > 0:
soln.append(2)
available[2] -= 1
elif available[1] > 0:
soln.append(1)
available[1] -= 1
else: # ran out of ones in home stretch... do a swap
pass
# some more logic...
togo = sum_target - sum(soln)
print(f'solution: {soln}')
print(f'generated: {sum(soln)/k} for target of {target}')
print(f'leftover: {available}')
Yields:
available at start: {1: 349, 2: 335, 3: 316}
solution: [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
generated: 1.84 for target of 1.8297872340425532
leftover: {1: 291, 2: 335, 3: 274}
[Finished in 117ms]