Scipy or bayesian optimize function with constraints, bounds and dataframe in python

Question

With the dataframe underneath I want to optimize the total return, while certain bounds are satisfied.

d = {'Win':[0,0,1, 0, 0, 1, 0],'Men':[0,1,0, 1, 1, 0, 0], 'Women':[1,0,1, 0, 0, 1,1],'Matches' :[0,5,4, 7, 4, 10,13],
     'Odds':[1.58,3.8,1.95, 1.95, 1.62, 1.8, 2.1], 'investment':[0,0,6, 10, 5, 25,0],}

data = pd.DataFrame(d)

I want to maximize the following equation:

totalreturn = np.sum(data['Odds'] * data['investment'] * (data['Win'] == 1))

The function should be maximized satisfying the following bounds:

for i in range(len(data)):
    
    investment = data['investment'][i]
    
    C = alpha0 + alpha1*data['Men'] + alpha2 * data['Women'] + alpha3 * data['Matches']
    
    if (lb < investment ) & (investment < ub) & (investment > C) == False:
        data['investment'][i] = 0

Hereby lb and ub are constant for every row in the dataframe. Threshold C however, is different for every row. Thus there are 6 parameters to be optimized: lb, ub, alph0, alpha1, alpha2, alpha3.

Can anyone tell me how to do this in python? My proceedings so far have been with scipy (Approach1) and Bayesian (Approach2) optimization and only lb and ub are tried to be optimized. Approach1:

import pandas as pd
from scipy.optimize import minimize

def objective(val, data):
    
    # Approach 1
    # Lowerbound and upperbound
    lb, ub = val
    
    # investments
    # These matches/bets are selected to put wager on
    tf1 = (data['investment'] > lb) & (data['investment'] < ub) 
    data.loc[~tf1, 'investment'] = 0
    
        
    # Total investment
    totalinvestment = sum(data['investment'])
    
    # Good placed bets 
    data['reward'] = data['Odds'] * data['investment'] * (data['Win'] == 1)
    totalreward = sum(data['reward'])

    # Return and cumalative return
    data['return'] = data['reward'] - data['investment']
    totalreturn = sum(data['return'])
    data['Cum return'] = data['return'].cumsum()
    
    # Return on investment
    print('\n',)
    print('lb, ub:', lb, ub)
    print('TotalReturn: ',totalreturn)
    print('TotalInvestment: ', totalinvestment)
    print('TotalReward: ', totalreward)
    print('# of bets', (data['investment'] != 0).sum())
          
    return totalreturn
          

# Bounds and contraints
b = (0,100)
bnds = (b,b,)
x0 = [0,100]

sol = minimize(objective, x0, args = (data,), method = 'Nelder-Mead', bounds = bnds)

and approach2:

import pandas as pd
import time
import pickle
from hyperopt import fmin, tpe, Trials
from hyperopt import STATUS_OK
from hyperopt import  hp

def objective(args):
    # Approach2

    # Lowerbound and upperbound
    lb, ub = args
    
    # investments
    # These matches/bets are selected to put wager on
    tf1 = (data['investment'] > lb) & (data['investment'] < ub) 
    data.loc[~tf1, 'investment'] = 0
    
        
    # Total investment
    totalinvestment = sum(data['investment'])
    
    # Good placed bets 
    data['reward'] = data['Odds'] * data['investment'] * (data['Win'] == 1)
    totalreward = sum(data['reward'])

    # Return and cumalative return
    data['return'] = data['reward'] - data['investment']
    totalreturn = sum(data['return'])
    data['Cum return'] = data['return'].cumsum()
    
    # store results
    d = {'loss': - totalreturn, 'status': STATUS_OK, 'eval time': time.time(),
    'other stuff': {'type': None, 'value': [0, 1, 2]},
    'attachments': {'time_module': pickle.dumps(time.time)}}
    
    return d

          

trials = Trials()

parameter_space  = [hp.uniform('lb', 0, 100), hp.uniform('ub', 0, 100)]

best = fmin(objective,
    space= parameter_space,
    algo=tpe.suggest,
    max_evals=500,
    trials = trials)


print('\n', trials.best_trial)

Anyone knows how I should proceed? Scipy doesn't generate the desired result. Hyperopt optimization does result in the desired result. In either approach I don't know how to incorporate a boundary that is row depended (C(i)).

Anything would help! (Any relative articles, exercises or helpful explanations about the sort of optimization are also more than welcome)

I believe the way this is formulated, things are non-differentiable. (Small change in lb,ub can cause a significant jump in the objective as suddenly observations drop out or are added). SLSQP is for smooth problems only. My initial thought would be to use binary variables to indicate if an observation is used. But that would need very different solvers. — Erwin Kalvelagen, Apr 07 '21 at 16:32
Thanks for the answer. But can you elaborate, what solvers do you think are better suited? — Herwini, Apr 08 '21 at 12:34

score 2 · Answer 1 · answered Apr 15 '21 at 07:04

I assume here that you cannot go through the whole dataset, or it is incomplete, or you want to extrapolate, so that you cannot calculate all combinations.

In case where you have no prior, and if you are uncertain about the smoothness, or that evaluations could be costly, I would use bayesian optimization. You can control the exploration/exploitation and prevent to get stuck in a minimum.

I would use scikit-optimize which implements bayesian optimization better IMO. They have better initialization techniques like Sobol' method which is implemented correctly here. This ensure that you're search space will be properly sampled.

from skopt import gp_minimize

res = gp_minimize(objective, bnds, initial_point_generator='sobol')

score 1 · Answer 2 · answered Apr 29 '21 at 07:11

I think your formulation needs one more variable, which would be binary and would define if investment should be saved as 0 or should it have its initial value. Assuming that this variable would be saved in another column called 'new_binary', your objective function could be changed as following:

totalreturn = np.sum(data['Odds'] * data['investment'] * data['new_binary'] * data['Win'])

then, the only thing missing is introducing the variable itself.

for i in range(len(data)):
    investment = data['investment'][i]
    C = alpha0 + alpha1*data['Men'] + alpha2 * data['Women'] + alpha3 * data['Matches']
    data['new_binary'] = (lb < data['investment'] ) & ( data['investment'] < ub) & (data['investment'] > C)
    # This should be enough to make the values in the columns binary, while in python it is easily replaced with 0 and 1.

The only problem that I see now is that this problem becomes integer, so I am not sure if scipy.optimize.minimize would do. I am not sure what could be an alternative, but according to this, PuLP and Pyomo could work.

Thanks! But how do you propose to incorporate your for loop with the introduced variable inside the objective function? Just paste in in the # investment section? — Herwini, May 04 '21 at 19:27

Scipy or bayesian optimize function with constraints, bounds and dataframe in python

2 Answers2