1

I'm currently trying to fit some data using Scipy's optimize.minimize function in Python. I have both constraints and bounds that need to be considered during the optimization process. I have found that the method 'trust-constr' works well for me, but other methods seem to only return the initial guess.

Thus my question is: why? Why does trust-constr work but the other methods only return the initial guess?

I made up a model problem to replicate this issue, let's say I'm trying to fit a polynomial

def f(x, a, b, c):
    return a * x + b * x**2 + c * x**3

subject to the constraint that a + b + c = 1. Additionally, I also want to enforce that all parameters a, b, and c are between 0 and 1.

Here is how I chose to phrase this problem in python:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

# model data: x and y values
x_data = np.array([0,1,2,3,4,5,6,7,8,9,10])
y_data = np.array([0,1,6,18.6,42.4,81,138,217,321.6,455.4,622])

# plt.plot(x_data, y_data)
# plt.show()

# Define the function
def f(x, a, b, c):
    return a * x + b * x**2 + c * x**3

# Define the cost function
def cost_function(params, x, y):
    a, b, c = params
    return np.sum((y - f(x, a, b, c))**2)

# Define the constraint: a + b + c = 1
def constraint(params):
    a, b, c = params
    return a + b + c - 1

# Define the initial guess for the parameters (a, b, c)
initial_guess = (0.4, 0.3, 0.3)

# Set the constraint as a dictionary
constraints = {'type': 'eq', 'fun': constraint}

# Define bounds for a, b, and c
bounds = [(0, 1), (0, 1), (0, 1)]

# Perform the minimization
result = minimize(cost_function, 
                  initial_guess, 
                  args=(x_data, y_data), 
                  constraints=constraints, 
                  bounds=bounds, 
                  method='trust-constr', # 'L-BFGS-B' cannot handle constraints
                  # options={"eps": 01e-3, "maxiter": 1000}, 
                  # tol=1e-12
                  )

# Extract the fitted parameters
a_fit, b_fit, c_fit = result.x

# or try this
popt = result.x

print(result)

print("Fitted parameters: a =", a_fit, "b =", b_fit, "c =", c_fit)

x = x_data

plt.plot(x, f(x, *popt))

plt.scatter(x, y_data)

plt.show()

The following behavior occurs for different methods:

trust-constr This seems to produce a working fit, however, it also tells me success = False.

C:\Users\pawel\anaconda3\lib\site-packages\scipy\optimize\_hessian_update_strategy.py:182: UserWarning: delta_grad == 0.0. Check if the approximated function is linear. If the function is linear better results can be obtained by defining the Hessian as zero instead of using quasi-Newton approximations.
  warn('delta_grad == 0.0. Check if the approximated '
 barrier_parameter: 0.020000000000000004
 barrier_tolerance: 0.020000000000000004
          cg_niter: 1008
      cg_stop_cond: 4
            constr: [array([0.]), array([0.20050759, 0.19943612, 0.60005629])]
       constr_nfev: [3996, 0]
       constr_nhev: [0, 0]
       constr_njev: [0, 0]
    constr_penalty: 1.0
  constr_violation: 0.0
    execution_time: 1.6354575157165527
               fun: 0.00012048037716891832
              grad: array([-0.1686714 , -0.63808457, -0.56454461])
               jac: [array([[1., 1., 1.]]), array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])]
   lagrangian_grad: array([ 0.01001285, -0.00704824, -0.00296461])
           message: 'The maximum number of function evaluations is exceeded.'
            method: 'tr_interior_point'
              nfev: 3996
              nhev: 0
               nit: 1000
             niter: 1000
              njev: 999
        optimality: 0.010012849606595209
            status: 0
           success: False
         tr_radius: 32.024249305108036
                 v: [array([0.51813562]), array([-0.33945137,  0.11290071,  0.04344438])]
                 x: array([0.20050759, 0.19943612, 0.60005629])
Fitted parameters: a = 0.20050759274916305 b = 0.19943611774529713 c = 0.6000562895055399

L-BFGS-B This will also produce a correct fit but will disregard any bounds.

C:\Users\pawel\anaconda3\lib\site-packages\scipy\optimize\_minimize.py:569: RuntimeWarning: Method L-BFGS-B cannot handle constraints.
  warn('Method %s cannot handle constraints.' % method,
      fun: 7.947399780365303e-09
 hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
      jac: array([-8.06467991e-05, -2.35554814e-04,  3.18168845e-04])
  message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
     nfev: 152
      nit: 6
     njev: 38
   status: 0
  success: True
        x: array([0.19995715, 0.20001357, 0.59999903])
Fitted parameters: a = 0.19995715450259197 b = 0.20001357495157662 c = 0.5999990285019313

method unspecified If I don't specify a method (or choose a different one (e.g. SLSQP) from the list found in https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) scipy.optimize.minimize just returns my initial guesses back to me.

     fun: 162156.72
     jac: array([  -14440.79882812,  -126218.40039062, -1132744.77148438])
 message: 'Optimization terminated successfully'
    nfev: 4
     nit: 5
    njev: 1
  status: 0
 success: True
       x: array([0.4, 0.3, 0.3])
Fitted parameters: a = 0.4 b = 0.3 c = 0.3

Question: Is it possible to use constraints and bounds simultaneously? Am I on the right track or is there something fundamentally flawed with this approach? Why does 'trust-constr' produce good results even though it claims to have been unsuccessful? Any help is greatly appreciated.

Sidenote: I've checked out the two relevant threads I could find on stackexchange (1: Optimize non-scalar function with inequality constraint and bounds, 2: Scipy optimization with SLSQP disregards constraints). These users seem to have no problem using both bounds and constraints... so it seems fundamentally possible but I don't understand why my code does not work.

Sidenote 2: In this specific case I also thought about using the absolute value of a, b, and c in the fit function and the constraint to force positive values.

That is, I use:

def f(x, a, b, c):
    return np.absolute(a) * x + np.absolute(b) * x**2 + np.absolute(c) * x**3

def constraint(params):
    a, b, c = params
    return np.absolute(a) + np.absolute(b) + np.absolute(c) - 1

and I remove the bounds=bounds line. This actually works.

I made up a different set of data to check:

y_data = np.array([0,0.7,4.4,14.7,35.2,69.5,121.2,193.9,291.2,416.7,574]) # a=.4, b=-.3, c=.6

Using this absolute value method yields:

Fitted parameters: a = 0.4338907307730815 b = -1.567539544823294e-10 c = 0.5661100512048183

Using 'trust-constr' yields:

Fitted parameters: a = 0.43388786948035785 b = 1.6299840564279863e-10 c = 0.5661121303566438

To my eyes, these answers look equivalent to within machine precision. Does anybody have feelings regarding which method is the "correct" way of doing this? To me it seems like using the absolute value term is a little clutch.

Progman
  • 16,827
  • 6
  • 33
  • 48
Pawel
  • 21
  • 4
  • Your problem is kind of lying about the number of degrees of freedom. Remove c from your parameters, calculate it on the fly, and remove your equality constraint. – Reinderien Apr 27 '23 at 20:46

1 Answers1

3

Solution by reducing number of variables

trust-constr This seems to produce a working fit, however, it also tells me success = False.

The trust-constr method is converging on the right solution - if you raise maxiter, you get solutions with increasingly less and less loss. However, the progress is very slow - I have not gotten it to converge even with 100K iterations. It seems to be making really tiny steps.

However, I found a way to simplify the problem. Specifically, instead of optimizing a function with N unknowns and a constraint that they add up to 1, optimize a function with N-1 unknowns, and determine the last unknown from the first N-1.

This requires a new constraint to be equivalent to your original problem: if the first N-1 unknowns add up to more than 1, then the last unknown will have a value less than zero, which violates your original bounds. (It does not require a constraint to handle the upper bound - c cannot be greater than 1 if a and b are greater than or equal to 0.)

Here's how I implemented that:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

# model data: x and y values
x_data = np.array([0,1,2,3,4,5,6,7,8,9,10])
y_data = np.array([0,1,6,18.6,42.4,81,138,217,321.6,455.4,622])

# Define the function
def f(x, a, b):
    c = 1 - a - b
    return a * x + b * x**2 + c * x**3

# Define the cost function
def cost_function(params, x, y):
    a, b = params
    c = 1 - a - b
    return np.sum((y - f(x, a, b))**2)

# Constrain last term to be larger than zero
def constraint(params):
    a, b = params
    c = 1 - a - b
    return c

# Define the initial guess for the parameters (a, b)
initial_guess = (0.4, 0.3)

# Set the constraint as a dictionary
constraints = {'type': 'ineq', 'fun': constraint}

# Define bounds for a, b
bounds = [(0, 1), (0, 1)]

# Perform the minimization
result = minimize(cost_function, 
                  initial_guess, 
                  args=(x_data, y_data), 
                  constraints=constraints, 
                  bounds=bounds, 
                  method='trust-constr',
                  )

# Extract the fitted parameters
a_fit, b_fit = result.x
c_fit = 1 - a_fit - b_fit

# or try this
popt = result.x

print(result)

print("Fitted parameters: a =", a_fit, "b =", b_fit, "c =", c_fit)

x = x_data

plt.plot(x, f(x, *popt))

plt.scatter(x, y_data)

plt.show()

This find a better solution in 122 steps than the N variable version finds in 100K steps.

Solution by reducing jacobian magnitude

SLSQP just returns my initial guesses back to me.

This method is having the problem that the objective function, and therefore the jacobians, are too large.

This answer explains why this matters:

Gradient based optimizations are very sensitive to scaling. I plotted your objective function inside your allowed design space and got this:

[image omitted]

So we can see that the minimum is at about 6 degrees, but the objective values are TINY (about 1e-4). As a general rule of thumb, getting your objective to around order of magnitude 1 is a good idea [...]

You're having the opposite problem: the objective function returns values around 1e+5, and when it calculates the jacobian, the result is huge.

I found that replacing the cost function with root-mean-square-error rather than sum-of-squared-error helps. This gets your objective function to span fewer orders of magnitude. This gets it to find a solution.

Example of RMSE cost function:

def cost_function(params, x, y):
    a, b, c = params
    return np.sqrt(np.mean((y - f(x, a, b, c))**2))

This change of cost function allows SLSQP to consistently solve the problem, even in the N variable form rather than the N-1 variable form.

I suspect that every other method in scipy.optimize which relies on gradients/jacobians is having the same problem.

Combined solution

I found that reducing the number of variables worked, and fixing the cost function worked, but the best solution was both, while using method='SLSQP'.

Full code:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

# model data: x and y values
x_data = np.array([0,1,2,3,4,5,6,7,8,9,10])
y_data = np.array([0,1,6,18.6,42.4,81,138,217,321.6,455.4,622])

# Define the function
def f(x, a, b):
    c = 1 - a - b
    return a * x + b * x**2 + c * x**3

# Define the cost function
def cost_function(params, x, y):
    a, b = params
    cost = np.sqrt(np.mean((y - f(x, a, b))**2))
    return cost

# Constrain last term to be larger than zero
def constraint(params):
    a, b = params
    c = 1 - a - b
    return c

# Define the initial guess for the parameters (a, b)
initial_guess = (0.4, 0.3)

# Set the constraint as a dictionary
constraints = {'type': 'ineq', 'fun': constraint}

# Define bounds for a, b
bounds = [(0, 1), (0, 1)]

# Perform the minimization
result = minimize(cost_function, 
                  initial_guess, 
                  args=(x_data, y_data),
                  constraints=constraints, 
                  bounds=bounds,
                  method='SLSQP',
                  )

# Extract the fitted parameters
a_fit, b_fit = result.x
c_fit = 1 - a_fit - b_fit

# or try this
popt = result.x

print(result)

print("Fitted parameters: a =", a_fit, "b =", b_fit, "c =", c_fit)

x = x_data

plt.plot(x, f(x, *popt))

plt.scatter(x, y_data)

plt.show()

This is the solution that was the fastest and most robust in my testing.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66