4

I'm using scipy.optimize.minimize with method='bfgs' to train a convex objective.

Every time I run a minimization, the first two calls the BFGS optimizer makes to my objective function always have the same parameter vector. This seems unnecessary as it wastes a good few minutes re-calculating the same thing twice.

Minimum Working Example (with a much simpler objective);

from scipy.optimize import minimize

def obj_jac(x):
    """Return objective value and jacobian value wrt. x"""
    print(x)
    return 10*x**2, 20*x

minimize(obj_jac, -100, method='bfgs', jac=True, tol=1e-7)

Output;

[-100.]
[-100.]
[-98.99]
[-94.95]
[-78.79]
[-30.17904355]
[-3.55271368e-15]

Does anyone know if this is expected behaviour for the BFGS implementation in scipy?


Update: I have submitted this as issue #10385 on the Scipy project.

aaronsnoswell
  • 6,051
  • 5
  • 47
  • 69

1 Answers1

2

This is not an expected behaviour or, at least, there is a reporting bug.

By doing the statistics output for the optimization via options parameters:

minimize(obj_jac, -100, method='bfgs', jac=True, tol=1e-7, options={'disp': True})

SciPy outputs the following:

[-100.]
[-100.]
[-98.99]
[-94.95]
[-78.79]
[-30.17904355]
[-3.55271368e-15]
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 3
         Function evaluations: 6
         Gradient evaluations: 6

where the reported number of functions and gradient evaluations certainly are off by one. So, there is certainly a stat reporting bug in SciPy for BFGS.

I would also suspect, that there is an inefficiency inside SciPy, which would be along the following lines. Before the iteration loop, evaluate the function and its gradient. Then start the loop and start from evaluating the function and its gradient. That would add an additional function evaluation for the 0th iteration and certainly can be avoided by slight code reorganization (probably, with some trade-off wrt algorithm readability flow).

The following is relevant:

Not being an expert in SciPy, I would say either an old bug popped up out of nowhere (then it should be reported) or it was never fixed in the first place, despite what I comprehended from the GitHub discussions.

Anton Menshov
  • 2,266
  • 14
  • 34
  • 55
  • 1
    Please report the bug and add a link here. – Simd Jul 01 '19 at 21:14
  • Great answer - thank you! I wasn't able to find any previous SO posts about this so I appreciate you linking to the 2014 one here. I'll submit a but report to the scipy team and post a link here when I do. – aaronsnoswell Jul 01 '19 at 22:51
  • @aaronsnoswell actually, I found the SO question through SciPy GitHub pages. The next day after I posted the answer. Glad to help! – Anton Menshov Jul 01 '19 at 23:51
  • This error here seems only loosely related. The 2014 error was that the numerical gradient call also computed the function value at the "center" point again, while outside the gradient routine this same value was also computed just before the gradient call. This is not observed here, as only the very first computation is duplicated. So it is probably a similar error, but not the same. – Lutz Lehmann Jul 02 '19 at 12:56