Why does scipy.optimize.least_squares exist when scipy.optimize.minimize could potentially be used for the same things?

Question

I am trying to understand why the scipy.optimize.least_squares exists in scipy. This function can be used to perform model-fitting. However, one could use scipy.optimize.minimize to do the same thing. The only difference is that scipy.optimize.least_squares does the calculation of the chi-squared internally, while if one wants to use scipy.optimize.minimize, he/she will have to calculate the chi-squared manually inside the function the user want to minimize. Also, scipy.optimize.least_squares can not be considered a wrapper around scipy.optimize.minimize because the three methods it supports (trf, dogbox, lm), are not supported at all by scipy.optimize.minimize.

So my questions are:

Why scipy.optimize.least_squares exists when the same result can be achieved with scipy.optimize.minimize?
Why scipy.optimize.minimize does not support the trf, dogbox, and lm methods?

Thank you.

Historically `scipy` is a collection of tools that were too specialized or 'advanced' to fit in `numpy`. Once a tool is put in a package like that, it is hard to remove (in case someone is still using it). Why questions like this can only be answered by digging into the code history, github issues, and developer discussions (and fragments of history from pre-github days). — hpaulj, Mar 10 '18 at 17:20
However, looking at the `least_squares` documentation I see it is new in 0.17, not that long ago. So there's probably a good amount of discussion about it. For example: https://github.com/scipy/scipy/issues/5020 and https://github.com/scipy/scipy/pull/5019 — hpaulj, Mar 10 '18 at 17:33
You can use ```minimize``` too for solving instances of Linear-Programming, but that's as dumb as it gets (compared to using a LP-solver; more specialized). It's just a more specialized function for a very common problem. Why would you use ```minimize``` here (which exact problem, which solver and which kind of jac/hess calculations)? (for some cases l-bfgs-b could become interesting) — sascha, Mar 10 '18 at 22:38
Convenience. Least squares is so common that ① it is convenient to have a method named `least_squares` and ② it is convenient to have a method that, _out of the box_, is specialized for a common task (I mean, no keyword arguments, no thinking about... aim, fire, BOOM) — gboffi, Sep 03 '19 at 11:51

score 2 · Answer 1 · answered Sep 03 '19 at 10:48

The algorithms in scipy.optimize.least_squares utilize the least-squares structure of the minimization problem for better convergence (or lower order of the used derivatives).

It's similar to the difference between the Gauss-Newton algorithm and Newton's method, see Wikipedia or this question.

In particular, Gauss-Newton only uses the Jacobian (first derivatives), whereas Newton's method also uses the Hessian (second derivatives), which is expensive to calculate.

Why does scipy.optimize.least_squares exist when scipy.optimize.minimize could potentially be used for the same things?

1 Answers1

Linked