10

I have a reasonably simple constrained optimization problem but get different answers depending on how I do it. Let's get the import and a pretty print function out of the way first:

import numpy as np
from scipy.optimize import minimize, LinearConstraint, NonlinearConstraint, SR1

def print_res( res, label ):
    print("\n\n ***** ", label, " ***** \n")
    print(res.message)
    print("obj func value at solution", obj_func(res.x))
    print("starting values: ", x0)
    print("ending values:   ", res.x.astype(int) )
    print("% diff", (100.*(res.x-x0)/x0).astype(int) )
    print("target achieved?",target,res.x.sum())

The sample data is very simple:

n = 5
x0 = np.arange(1,6) * 10_000
target = x0.sum() + 5_000   # increase sum from 15,000 to 20,000

Here's the constrained optimization (including jacobians). In words, the objective function I want to minimize is just the sum of squared percentage changes from the initial values to final values. The linear equality constraint is simply requiring x.sum() to equal a constant.

def obj_func(x):
    return ( ( ( x - x0 ) / x0 ) ** 2 ).sum()

def obj_jac(x):
    return 2. * ( x - x0 ) / x0 ** 2

def constr_func(x):
    return x.sum() - target

def constr_jac(x):
    return np.ones(n)

And for comparison, I've re-factored as an unconstrained minimization by using the equality constraint to replace x[0] with a function of x[1:]. Note that the unconstrained function is passed x0[1:] whereas the constrained function is passed x0.

def unconstr_func(x):
    x_one       = target - x.sum()
    first_term  = ( ( x_one - x0[0] ) / x0[0] ) ** 2
    second_term = ( ( ( x - x0[1:] ) / x0[1:] ) ** 2 ).sum()
    return first_term + second_term

I then try to minimize in three ways:

  1. Unconstrained with 'Nelder-Mead'
  2. Constrained with 'trust-constr' (w/ & w/o jacobian)
  3. Constrained with 'SLSQP' (w/ & w/o jacobian)

Code:

##### (1) unconstrained

res0 = minimize( unconstr_func, x0[1:], method='Nelder-Mead')   # OK, but weird note
res0.x = np.hstack( [target - res0.x.sum(), res0.x] )
print_res( res0, 'unconstrained' )    

##### (2a) constrained -- trust-constr w/ jacobian

nonlin_con = NonlinearConstraint( constr_func, 0., 0., constr_jac )
resTCjac = minimize( obj_func, x0, method='trust-constr',
                     jac='2-point', hess=SR1(), constraints = nonlin_con )
print_res( resTCjac, 'trust-const w/ jacobian' )

##### (2b) constrained -- trust-constr w/o jacobian

nonlin_con = NonlinearConstraint( constr_func, 0., 0. )    
resTC = minimize( obj_func, x0, method='trust-constr',
                  jac='2-point', hess=SR1(), constraints = nonlin_con )    
print_res( resTC, 'trust-const w/o jacobian' )

##### (3a) constrained -- SLSQP w/ jacobian

eq_cons = { 'type': 'eq', 'fun' : constr_func, 'jac' : constr_jac }
resSQjac = minimize( obj_func, x0, method='SLSQP',
                     jac = obj_jac, constraints = eq_cons )    
print_res( resSQjac, 'SLSQP w/ jacobian' )

##### (3b) constrained -- SLSQP w/o jacobian

eq_cons = { 'type': 'eq', 'fun' : constr_func }    
resSQ = minimize( obj_func, x0, method='SLSQP',
                  jac = obj_jac, constraints = eq_cons )
print_res( resSQ, 'SLSQP w/o jacobian' )

Here is some simplified output (and of course you can run the code to get the full output):

starting values:  [10000 20000 30000 40000 50000]

***** (1) unconstrained  *****
Optimization terminated successfully.
obj func value at solution 0.0045454545454545305
ending values:    [10090 20363 30818 41454 52272]

***** (2a) trust-const w/ jacobian  *****
The maximum number of function evaluations is exceeded.
obj func value at solution 0.014635854609684874
ending values:    [10999 21000 31000 41000 51000]

***** (2b) trust-const w/o jacobian  *****
`gtol` termination condition is satisfied.
obj func value at solution 0.0045454545462939935
ending values:    [10090 20363 30818 41454 52272]

***** (3a) SLSQP w/ jacobian  *****
Optimization terminated successfully.
obj func value at solution 0.014636111111111114
ending values:    [11000 21000 31000 41000 51000]    

***** (3b) SLSQP w/o jacobian  *****   
Optimization terminated successfully.
obj func value at solution 0.014636111111111114
ending values:    [11000 21000 31000 41000 51000]

Notes:

  1. (1) & (2b) are plausible solutions in that they achieve significantly lower objective function values and intuitively we'd expect the variables with larger starting values to move more (both absolutely and in percentage terms) than the smaller ones.

  2. Adding the jacobian to 'trust-const' causes it to get the wrong answer (or at least a worse answer) and also to exceed max iterations. Maybe the jacobian is wrong, but the function is so simple that I'm pretty sure it's correct (?)

  3. 'SLSQP' doesn't seem to work w/ or w/o the jacobian supplied, but works very fast and claims to terminate successfully. This seems very worrisome in that getting the wrong answer and claiming to have terminated successfully is pretty much the worst possible outcome.

  4. Initially I used very small starting values and targets (just 1/1,000 of what I have above) and in that case all 5 approaches above work fine and give the same answers. My sample data is still extremely small, and it seems kinda bizarre for it to handle 1,2,..,5 but not 1000,2000,..5000.

  5. FWIW, note that the 3 incorrect results all hit the target by adding 1,000 to each initial value -- this satisfies the constraint but comes nowhere near minimizing the objective function (b/c variables with higher initial values should be increased more than lower ones to minimize the sum of squared percentage differences).

So my question is really just what is happening here and why do only (1) and (2b) seem to work?

More generally, I'd like to find a good python-based approach to this and similar optimization problems and will consider answers using other packages besides scipy although the best answer would ideally also address what is going on with scipy here (e.g. is this user error or a bug I should post to github?).

JohnE
  • 29,156
  • 8
  • 79
  • 109
  • For the unconstrained minimization, what do you get if you explicitly set `fatol=1e-8`? – user545424 Apr 05 '19 at 21:11
  • I meant, `fatol` not `xatol`. Unfortunately I can't test because my scipy version is too old. My suspicion is that it's just terminating early because it's getting fairly close to the minimum and so the 7 simplex points all differ by less than the default value of `0.0001`. – user545424 Apr 05 '19 at 21:20
  • @user545424 Fixed: no change if I add `options={'fatol':1e-8}` to 'Nelder-Mead' but that one was working OK already, it's 2a, 3a, 3b that are off. Added to 3a & 3b but no change to those either. – JohnE Apr 05 '19 at 21:24
  • Ah ok, I misread the results. I wonder if there is simply a local minimum once it hits the boundary. Did you check what the Hessian is at the minimum is? – user545424 Apr 05 '19 at 21:34
  • This problem can be solved analytically. Noting the correct solution might be helpful. – Subhaneil Lahiri Apr 05 '19 at 22:02
  • 1
    For what it's worth I tried your example using SLSQP using the `nlopt` library and it gave the correct results, so that rules out a problem with your jacobian function or a local minimum. – user545424 Apr 05 '19 at 22:06
  • @user545424 Thanks! Good to have that confirmation! – JohnE Apr 06 '19 at 13:36
  • @JohnE I know, there's something odd here. I just meant that knowing the correct answer would be helpful. But now that I look at the numbers again, it seems like 1 and 2b are correct (as you said). – Subhaneil Lahiri Apr 06 '19 at 14:45
  • 1
    As the constraint is linear, it's Hessian is zero. Could this result in putting too much weight on the constraint? E g. If the Jacobian is multiplied by the inverse Hessian, with an inexact estimate of the Hessian. – Subhaneil Lahiri Apr 06 '19 at 15:06
  • @SubhaneilLahiri you are on to something. Adding a hessian of zeros fixes 2a (trust-constr with jacobian). SLSQP doesn't even take a hessian as best I can tell. I should note that on my laptop I got a helpful message "If the function is linear better results can be obtained by defining the Hessian as zero instead of using quasi-Newton approximations" that I didn't get when I ran it on my work computer yesterday (both are fairly recent versions of scipy). Feel free to write this up as an answer if you like. Btw, my syntax was: `def constr_hess(x,v): return np.zeros([n,n])` – JohnE Apr 06 '19 at 16:41
  • 1
    Better (Convex) QP solvers are available under CVXPY. – Erwin Kalvelagen Apr 07 '19 at 09:44

2 Answers2

8

Here is how this problem could be solved using nlopt which is a library for nonlinear optimization which I've been pretty impressed with.

First, the objective function and gradient are both defined using the same function:

def obj_func(x, grad):
    if grad.size > 0:
        grad[:] = obj_jac(x)
    return ( ( ( x/x0 - 1 )) ** 2 ).sum()

def obj_jac(x):
    return 2. * ( x - x0 ) / x0 ** 2

def constr_func(x, grad):
    if grad.size > 0:
        grad[:] = constr_jac(x)
    return x.sum() - target

def constr_jac(x):
    return np.ones(n)

Then, to run the minimization using Nelder-Mead and SLSQP:

opt = nlopt.opt(nlopt.LN_NELDERMEAD,len(x0)-1)
opt.set_min_objective(unconstr_func)
opt.set_ftol_abs(1e-15)
xopt = opt.optimize(x0[1:].copy())
xopt = np.hstack([target - xopt.sum(), xopt])
fval = opt.last_optimum_value()
print_res(xopt,fval,"Nelder-Mead");

opt = nlopt.opt(nlopt.LD_SLSQP,len(x0))
opt.set_min_objective(obj_func)
opt.add_equality_constraint(constr_func)
opt.set_ftol_abs(1e-15)
xopt = opt.optimize(x0.copy())
fval = opt.last_optimum_value()
print_res(xopt,fval,"SLSQP w/ jacobian");

And here are the results:

 *****  Nelder-Mead  ***** 

obj func value at solution 0.00454545454546
result:  3
starting values:  [ 10000.  20000.  30000.  40000.  50000.]
ending values:    [10090 20363 30818 41454 52272]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0


 *****  SLSQP w/ jacobian  ***** 

obj func value at solution 0.00454545454545
result:  3
starting values:  [ 10000.  20000.  30000.  40000.  50000.]
ending values:    [10090 20363 30818 41454 52272]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0

When testing this out, I think I discovered what the issue with the original attempt was. If I set the absolute tolerance on the function to 1e-8 which is what the scipy functions default to I get:

 *****  Nelder-Mead  ***** 

obj func value at solution 0.0045454580693
result:  3
starting values:  [ 10000.  20000.  30000.  40000.  50000.]
ending values:    [10090 20363 30816 41454 52274]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0


 *****  SLSQP w/ jacobian  ***** 

obj func value at solution 0.0146361108503
result:  3
starting values:  [ 10000.  20000.  30000.  40000.  50000.]
ending values:    [10999 21000 31000 41000 51000]
% diff [9 5 3 2 2]
target achieved? 155000.0 155000.0

which is exactly what you were seeing. So my guess is that the minimizer ends up somewhere in the likelihood space during SLSQP where the next jump is less than 1e-8 from the last place.

user545424
  • 15,713
  • 11
  • 56
  • 70
  • 1
    Thanks! I may hold off on the checkmark a little longer b/c I'm thinking of putting a bounty here to try and get a fuller explanation of what's going on with scipy but this is very helpful (along with your comments under the OP) – JohnE Apr 09 '19 at 20:54
  • 1
    @JohnE, just curious, did changing `fatol` to 1e-15 fix the problems in all 3 of the cases you originally noticed? – user545424 Apr 18 '19 at 01:15
  • 1
    see the answer I just added but basically yes for SLSQP but not for trust-constr – JohnE Apr 18 '19 at 17:58
  • Looking at the docs for `trust-constr`, it has a handful of other tolerances which all default to `1e-8`. Would be curious to know if setting all of these lower fix the problem without explicitly setting the hessian. – user545424 Apr 18 '19 at 18:17
  • In fact, it looks like it doesn't even have a `ftol` variable. – user545424 Apr 18 '19 at 18:20
  • Right, the particular code I tried was `options = { 'xtol':1e-25, 'gtol':1e-25 }` but it didn't help in this case. – JohnE Apr 18 '19 at 18:23
  • Ah ok. And does it still show the same output as in your question, i.e. max number of function evaluations or does it claim to have succeeded? – user545424 Apr 18 '19 at 18:27
  • Well, I'd encourage you to just test it yourself whenever you have time. But either way, message I got was `The maximum number of function evaluations is exceeded.` I didn't look at the full output (like # iterations and such) but final values and final objective function value are all the same. So perhaps there are some minor differences but I can certainly say there are no significant or substantive differences from changing the `xtol` & `gtol` (but perhaps you or someone else would find otherwise) – JohnE Apr 18 '19 at 18:33
  • Ok thanks. Yeah I will try to install the latest scipy version to give this a try. – user545424 Apr 18 '19 at 18:44
  • 1
    Thanks! I did start to look into the `trust-constr` method to figure out what was going on there, but it is a very complicated method. I was able to determine that it was *slowly* moving towards the minimum but for some reason the step size was incredibly small, however I couldn't figure out exactly what was causing that. – user545424 Apr 25 '19 at 18:35
1

This is a partial answer to the question that I'm putting here to keep the question from getting even bigger, but I'd still love to see a more comprehensive and explanatory answer. These answers are based on comments from two others, but neither of them fully wrote out the code, and I thought it would make sense to make that explicit so here it is:

Fixing 2a (trust-constr with jacobian)

It's seems that the key here with regard to the Jacobian and Hessian is to specify neither or both (but not the jacobian only). @SubhaneilLahiri commented to this effect and there was also an error message to this effect that I initially failed to notice:

UserWarning: delta_grad == 0.0. Check if the approximated function is linear. If the function is linear better results can be obtained by defining the Hessian as zero instead of using quasi-Newton approximations.

So I fixed it by defining the hessian function:

def constr_hess(x,v):
    return np.zeros([n,n])

and adding it to the constraint

nonlin_con = NonlinearConstraint( constr_func, 0., 0., constr_jac, constr_hess )

Fixing 3a & 3b (SLSQP)

This just seemed to be a matter of making the tolerance smaller as suggested by @user545424. So I just added options={'ftol':1e-15} to the minimization:

resSQjac = minimize( obj_func, x0, method='SLSQP',
                     options={'ftol':1e-15},
                     jac = obj_jac, constraints = eq_cons )
JohnE
  • 29,156
  • 8
  • 79
  • 109
  • 1
    Regarding your second question, I think it would be better if scipy set `ftol` by default to be the machine precision for doubles. Alternatively, what `nlopt` does when you set no limits is to set it to zero by default, and then typically you end up with an error warning about roundoff, which forces the user to set a reasonable `ftol`. – user545424 Apr 18 '19 at 18:02
  • 1
    Hey John and @user545424, your comments and answers just solved a problem I've been dealing with for few days (banging head against the wall) and I'm so grateful for that. It was ALL about ftol! – NULL Jul 23 '20 at 15:29