With the dataframe underneath I want to optimize the total return, while certain bounds are satisfied.
d = {'Win':[0,0,1, 0, 0, 1, 0],'Men':[0,1,0, 1, 1, 0, 0], 'Women':[1,0,1, 0, 0, 1,1],'Matches' :[0,5,4, 7, 4, 10,13],
'Odds':[1.58,3.8,1.95, 1.95, 1.62, 1.8, 2.1], 'investment':[0,0,6, 10, 5, 25,0],}
data = pd.DataFrame(d)
I want to maximize the following equation:
totalreturn = np.sum(data['Odds'] * data['investment'] * (data['Win'] == 1))
The function should be maximized satisfying the following bounds:
for i in range(len(data)):
investment = data['investment'][i]
C = alpha0 + alpha1*data['Men'] + alpha2 * data['Women'] + alpha3 * data['Matches']
if (lb < investment ) & (investment < ub) & (investment > C) == False:
data['investment'][i] = 0
Hereby lb
and ub
are constant for every row in the dataframe. Threshold C
however, is different for every row. Thus there are 6 parameters to be optimized: lb, ub, alph0, alpha1, alpha2, alpha3
.
Can anyone tell me how to do this in python? My proceedings so far have been with scipy (Approach1) and Bayesian (Approach2) optimization and only lb
and ub
are tried to be optimized.
Approach1:
import pandas as pd
from scipy.optimize import minimize
def objective(val, data):
# Approach 1
# Lowerbound and upperbound
lb, ub = val
# investments
# These matches/bets are selected to put wager on
tf1 = (data['investment'] > lb) & (data['investment'] < ub)
data.loc[~tf1, 'investment'] = 0
# Total investment
totalinvestment = sum(data['investment'])
# Good placed bets
data['reward'] = data['Odds'] * data['investment'] * (data['Win'] == 1)
totalreward = sum(data['reward'])
# Return and cumalative return
data['return'] = data['reward'] - data['investment']
totalreturn = sum(data['return'])
data['Cum return'] = data['return'].cumsum()
# Return on investment
print('\n',)
print('lb, ub:', lb, ub)
print('TotalReturn: ',totalreturn)
print('TotalInvestment: ', totalinvestment)
print('TotalReward: ', totalreward)
print('# of bets', (data['investment'] != 0).sum())
return totalreturn
# Bounds and contraints
b = (0,100)
bnds = (b,b,)
x0 = [0,100]
sol = minimize(objective, x0, args = (data,), method = 'Nelder-Mead', bounds = bnds)
and approach2:
import pandas as pd
import time
import pickle
from hyperopt import fmin, tpe, Trials
from hyperopt import STATUS_OK
from hyperopt import hp
def objective(args):
# Approach2
# Lowerbound and upperbound
lb, ub = args
# investments
# These matches/bets are selected to put wager on
tf1 = (data['investment'] > lb) & (data['investment'] < ub)
data.loc[~tf1, 'investment'] = 0
# Total investment
totalinvestment = sum(data['investment'])
# Good placed bets
data['reward'] = data['Odds'] * data['investment'] * (data['Win'] == 1)
totalreward = sum(data['reward'])
# Return and cumalative return
data['return'] = data['reward'] - data['investment']
totalreturn = sum(data['return'])
data['Cum return'] = data['return'].cumsum()
# store results
d = {'loss': - totalreturn, 'status': STATUS_OK, 'eval time': time.time(),
'other stuff': {'type': None, 'value': [0, 1, 2]},
'attachments': {'time_module': pickle.dumps(time.time)}}
return d
trials = Trials()
parameter_space = [hp.uniform('lb', 0, 100), hp.uniform('ub', 0, 100)]
best = fmin(objective,
space= parameter_space,
algo=tpe.suggest,
max_evals=500,
trials = trials)
print('\n', trials.best_trial)
Anyone knows how I should proceed? Scipy doesn't generate the desired result. Hyperopt optimization does result in the desired result. In either approach I don't know how to incorporate a boundary that is row depended (C(i)
).
Anything would help! (Any relative articles, exercises or helpful explanations about the sort of optimization are also more than welcome)