How to compute standard deviation errors with scipy.optimize.least_squares

Question

I compare fitting with optimize.curve_fit and optimize.least_squares. With curve_fit I get the covariance matrix pcov as an output and I can calculate the standard deviation errors for my fitted variables by that:

perr = np.sqrt(np.diag(pcov))

If I do the fitting with least_squares, I do not get any covariance matrix output and I am not able to calculate the standard deviation errors for my variables.

Here's my example:

#import modules
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import least_squares

noise = 0.5
N = 100
t = np.linspace(0, 4*np.pi, N)

# generate data
def generate_data(t, freq, amplitude, phase, offset, noise=0, n_outliers=0, random_state=0):
    #formula for data generation with noise and outliers
    y = np.sin(t * freq + phase) * amplitude + offset
    rnd = np.random.RandomState(random_state)
    error = noise * rnd.randn(t.size)
    outliers = rnd.randint(0, t.size, n_outliers)
    error[outliers] *= 10
    return y + error

#generate data
data = generate_data(t, 1, 3, 0.001, 0.5, noise, n_outliers=10)

#initial guesses
p0=np.ones(4)
x0=np.ones(4)

# create the function we want to fit
def my_sin(x, freq, amplitude, phase, offset):
    return np.sin(x * freq + phase) * amplitude + offset

# create the function we want to fit for least-square
def my_sin_lsq(x, t, y):
    # freq=x[0]
    # phase=x[1]
    # amplitude=x[2]
    # offset=x[3]
    return (np.sin(t*x[0]+x[2])*x[1]+ x[3]) - y

# now do the fit for curve_fit
fit = curve_fit(my_sin, t, data, p0=p0)
print 'Curve fit output:'+str(fit[0])

#now do the fit for least_square
res_lsq = least_squares(my_sin_lsq, x0, args=(t, data))
print 'Least_squares output:'+str(res_lsq.x)


# we'll use this to plot our first estimate. This might already be good enough for you
data_first_guess = my_sin(t, *p0)

#data_first_guess_lsq = x0[2]*np.sin(t*x0[0]+x0[1])+x0[3]
data_first_guess_lsq = my_sin(t, *x0)

# recreate the fitted curve using the optimized parameters
data_fit = my_sin(t, *fit[0])
data_fit_lsq = my_sin(t, *res_lsq.x)

#calculation of residuals
residuals = data - data_fit
residuals_lsq = data - data_fit_lsq
ss_res = np.sum(residuals**2)
ss_tot = np.sum((data-np.mean(data))**2)
ss_res_lsq = np.sum(residuals_lsq**2)
ss_tot_lsq = np.sum((data-np.mean(data))**2)

#R squared
r_squared = 1 - (ss_res/ss_tot)
r_squared_lsq = 1 - (ss_res_lsq/ss_tot_lsq)
print 'R squared curve_fit is:'+str(r_squared)
print 'R squared least_squares is:'+str(r_squared_lsq)

plt.figure()
plt.plot(t, data)
plt.title('curve_fit')
plt.plot(t, data_first_guess)
plt.plot(t, data_fit)
plt.plot(t, residuals)

plt.figure()
plt.plot(t, data)
plt.title('lsq')
plt.plot(t, data_first_guess_lsq)
plt.plot(t, data_fit_lsq)
plt.plot(t, residuals_lsq)

#error
perr = np.sqrt(np.diag(fit[1]))
print 'The standard deviation errors for curve_fit are:' +str(perr)

I would be very thankful for any help, best wishes

ps: I got a lot of input from this source and used part of the code Robust regression

Nobody here that can help me? – strohfelder Mar 03 '17 at 12:23 — strohfelder, Mar 03 '17 at 12:23

score 10 · Answer 1 · edited Feb 12 '19 at 16:55

10

The result of optimize.least_squares has a parameter inside of it called jac. From the documentation:

jac : ndarray, sparse matrix or LinearOperator, shape (m, n)

Modified Jacobian matrix at the solution, in the sense that J^T J is a Gauss-Newton approximation of the Hessian of the cost function. The type is the same as the one used by the algorithm.

This can be used to estimate the Covariance Matrix of the parameters using the following formula: Sigma = (J'J)^-1.

J = res_lsq.jac
cov = np.linalg.inv(J.T.dot(J))

To find the variance of the parameters one can then use:

var = np.sqrt(np.diagonal(cov))

edited Feb 12 '19 at 16:55

amicitas

13,053
5
38
50

answered Mar 08 '17 at 04:01

Alex Shmakov

115
7

I would suggest to use `res_lsq.fun**2` instead of `residuals_lsq**2` to make this answer more generic. – Asking Questions Aug 05 '18 at 11:59
5

This answer contains two errors as far as I can see: 1. the covariance matrix needs to be multiplied by the RMS of the residuals. 2. the final results shown is the standard deviation, not the variance. See: https://stackoverflow.com/a/21844726/1391441 and https://stackoverflow.com/a/14857441/1391441 – Gabriel Mar 13 '20 at 13:48

score 5 · Answer 2 · answered Apr 09 '21 at 15:09

The SciPy program optimize.least_squares requires the user to provide in input a function fun(...) which returns a vector of residuals. This is typically defined as

residuals = (data - model)/sigma

where data and model are vectors with the data to fit and the corresponding model predictions for each data point, while sigma is the 1σ uncertainty in each data value.

In this situation, and assuming one can trust the input sigma uncertainties, one can use the output Jacobian matrix jac returned by least_squares to estimate the covariance matrix. Moreover, assuming the covariance matrix is diagonal, or simply ignoring non-diagonal terms, one can also obtain the 1σ uncertainty perr in the model parameters (often called "formal errors") as follows (see Section 15.4.2 of Numerical Recipes 3rd ed.)

import numpy as np
from scipy import linalg, optimize

res = optimize.least_squares(...)

U, s, Vh = linalg.svd(res.jac, full_matrices=False)
tol = np.finfo(float).eps*s[0]*max(res.jac.shape)
w = s > tol
cov = (Vh[w].T/s[w]**2) @ Vh[w]  # robust covariance matrix
perr = np.sqrt(np.diag(cov))     # 1sigma uncertainty on fitted parameters

The above code to obtain the covariance matrix is formally the same as the following simpler one (as suggested by Alex), but the above has the major advantage that it works even when the Jacobian is close to degenerate, which is a common occurrence in real-world least-squares fits

cov = linalg.inv(res.jac.T @ res.jac)  # covariance matrix when jac not degenerate

If one does not trust the input uncertainties sigma, one can still assume that the fit is good, to estimate the data uncertainties from the fit itself. This corresponds to assuming chi**2/DOF=1, where DOF is the number of degrees of freedom. In this case, one can use the following lines to rescale the covariance matrix before computing the uncertainties

chi2dof = np.sum(res.fun**2)/(res.fun.size - res.x.size)
cov *= chi2dof
perr = np.sqrt(np.diag(cov))    # 1sigma uncertainty on fitted parameters

I'm confused attempting to reconcile this answer with the equation given in . If I compute `cov = linalg.inv(res.jac.T @ res.jac) (eqn. 1)` or `cov = (Vh[w].T/s[w]**2) @ Vh[w] (eqn. 2)` I obtain equivalent answers. However if I directly compute the referenced equation (c.f. eqn. 15.4.20) `cov(aj, ak) = Sum_i [(Vji Vki) / wi**2] (eqn 3)` I do not obtain an equivalent result. I thought c++ like Python was row major? Yet from inspection of the object shapes in `eqn. 1` this must be the correct formulation. — PetMetz, Jun 08 '23 at 19:03

How to compute standard deviation errors with scipy.optimize.least_squares

2 Answers2

Linked