I'm currently trying to fit some data using Scipy's optimize.minimize function in Python. I have both constraints and bounds that need to be considered during the optimization process. I have found that the method 'trust-constr' works well for me, but other methods seem to only return the initial guess.
Thus my question is: why? Why does trust-constr work but the other methods only return the initial guess?
I made up a model problem to replicate this issue, let's say I'm trying to fit a polynomial
def f(x, a, b, c):
return a * x + b * x**2 + c * x**3
subject to the constraint that a + b + c = 1. Additionally, I also want to enforce that all parameters a, b, and c are between 0 and 1.
Here is how I chose to phrase this problem in python:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
# model data: x and y values
x_data = np.array([0,1,2,3,4,5,6,7,8,9,10])
y_data = np.array([0,1,6,18.6,42.4,81,138,217,321.6,455.4,622])
# plt.plot(x_data, y_data)
# plt.show()
# Define the function
def f(x, a, b, c):
return a * x + b * x**2 + c * x**3
# Define the cost function
def cost_function(params, x, y):
a, b, c = params
return np.sum((y - f(x, a, b, c))**2)
# Define the constraint: a + b + c = 1
def constraint(params):
a, b, c = params
return a + b + c - 1
# Define the initial guess for the parameters (a, b, c)
initial_guess = (0.4, 0.3, 0.3)
# Set the constraint as a dictionary
constraints = {'type': 'eq', 'fun': constraint}
# Define bounds for a, b, and c
bounds = [(0, 1), (0, 1), (0, 1)]
# Perform the minimization
result = minimize(cost_function,
initial_guess,
args=(x_data, y_data),
constraints=constraints,
bounds=bounds,
method='trust-constr', # 'L-BFGS-B' cannot handle constraints
# options={"eps": 01e-3, "maxiter": 1000},
# tol=1e-12
)
# Extract the fitted parameters
a_fit, b_fit, c_fit = result.x
# or try this
popt = result.x
print(result)
print("Fitted parameters: a =", a_fit, "b =", b_fit, "c =", c_fit)
x = x_data
plt.plot(x, f(x, *popt))
plt.scatter(x, y_data)
plt.show()
The following behavior occurs for different methods:
trust-constr This seems to produce a working fit, however, it also tells me success = False.
C:\Users\pawel\anaconda3\lib\site-packages\scipy\optimize\_hessian_update_strategy.py:182: UserWarning: delta_grad == 0.0. Check if the approximated function is linear. If the function is linear better results can be obtained by defining the Hessian as zero instead of using quasi-Newton approximations.
warn('delta_grad == 0.0. Check if the approximated '
barrier_parameter: 0.020000000000000004
barrier_tolerance: 0.020000000000000004
cg_niter: 1008
cg_stop_cond: 4
constr: [array([0.]), array([0.20050759, 0.19943612, 0.60005629])]
constr_nfev: [3996, 0]
constr_nhev: [0, 0]
constr_njev: [0, 0]
constr_penalty: 1.0
constr_violation: 0.0
execution_time: 1.6354575157165527
fun: 0.00012048037716891832
grad: array([-0.1686714 , -0.63808457, -0.56454461])
jac: [array([[1., 1., 1.]]), array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])]
lagrangian_grad: array([ 0.01001285, -0.00704824, -0.00296461])
message: 'The maximum number of function evaluations is exceeded.'
method: 'tr_interior_point'
nfev: 3996
nhev: 0
nit: 1000
niter: 1000
njev: 999
optimality: 0.010012849606595209
status: 0
success: False
tr_radius: 32.024249305108036
v: [array([0.51813562]), array([-0.33945137, 0.11290071, 0.04344438])]
x: array([0.20050759, 0.19943612, 0.60005629])
Fitted parameters: a = 0.20050759274916305 b = 0.19943611774529713 c = 0.6000562895055399
L-BFGS-B This will also produce a correct fit but will disregard any bounds.
C:\Users\pawel\anaconda3\lib\site-packages\scipy\optimize\_minimize.py:569: RuntimeWarning: Method L-BFGS-B cannot handle constraints.
warn('Method %s cannot handle constraints.' % method,
fun: 7.947399780365303e-09
hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
jac: array([-8.06467991e-05, -2.35554814e-04, 3.18168845e-04])
message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 152
nit: 6
njev: 38
status: 0
success: True
x: array([0.19995715, 0.20001357, 0.59999903])
Fitted parameters: a = 0.19995715450259197 b = 0.20001357495157662 c = 0.5999990285019313
method unspecified If I don't specify a method (or choose a different one (e.g. SLSQP) from the list found in https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) scipy.optimize.minimize just returns my initial guesses back to me.
fun: 162156.72
jac: array([ -14440.79882812, -126218.40039062, -1132744.77148438])
message: 'Optimization terminated successfully'
nfev: 4
nit: 5
njev: 1
status: 0
success: True
x: array([0.4, 0.3, 0.3])
Fitted parameters: a = 0.4 b = 0.3 c = 0.3
Question: Is it possible to use constraints and bounds simultaneously? Am I on the right track or is there something fundamentally flawed with this approach? Why does 'trust-constr' produce good results even though it claims to have been unsuccessful? Any help is greatly appreciated.
Sidenote: I've checked out the two relevant threads I could find on stackexchange (1: Optimize non-scalar function with inequality constraint and bounds, 2: Scipy optimization with SLSQP disregards constraints). These users seem to have no problem using both bounds and constraints... so it seems fundamentally possible but I don't understand why my code does not work.
Sidenote 2: In this specific case I also thought about using the absolute value of a, b, and c in the fit function and the constraint to force positive values.
That is, I use:
def f(x, a, b, c):
return np.absolute(a) * x + np.absolute(b) * x**2 + np.absolute(c) * x**3
def constraint(params):
a, b, c = params
return np.absolute(a) + np.absolute(b) + np.absolute(c) - 1
and I remove the bounds=bounds
line. This actually works.
I made up a different set of data to check:
y_data = np.array([0,0.7,4.4,14.7,35.2,69.5,121.2,193.9,291.2,416.7,574]) # a=.4, b=-.3, c=.6
Using this absolute value method yields:
Fitted parameters: a = 0.4338907307730815 b = -1.567539544823294e-10 c = 0.5661100512048183
Using 'trust-constr' yields:
Fitted parameters: a = 0.43388786948035785 b = 1.6299840564279863e-10 c = 0.5661121303566438
To my eyes, these answers look equivalent to within machine precision. Does anybody have feelings regarding which method is the "correct" way of doing this? To me it seems like using the absolute value term is a little clutch.