Why do scipy.optimize error bars on parameters extend outside bounds

Question

I was reviewing Getting standard errors on fitted parameters using the optimize.leastsq method in python as I want to use optimize.least_squares to fit some data, however I have bounds (for physical reasons) on my parameters, e.g. parameter A must be [0,1]. I see that the method using the covariance matrix and multiplying by the residuals leads to errorbars that extend outside the bounds of the parameters. What is the statistically correct and rigorous thing to do? Artificially cut off my errorbars at my permitted bounds or is there some other way to take care of the errorbars to be within the boundaries?

import numpy as np
from scipy import optimize
import random

def f( x, p0, p1, p2):
    return p0*x + 0.4*np.sin(p1*x) + p2

def ff(x, p):
    return f(x, *p)

# These are the true parameters. 

p0 = 1.0 #In the fit we will bound p0 from 0 to 1 for e.g., some physical/modeling reason
p1 = 40
p2 = 2.0

# These are initial guesses for fits:
pstart = [0.8,40,2]

# Generate data with a bit of randomness
# (the noise-less function that underlies the data is shown as a blue line)

xdata = np.array(np.linspace(0., 1, 120))
np.random.seed(42)
err_stdev = 0.2
yvals_err =  np.random.normal(0., err_stdev, len(xdata))
ydata = f(xdata, p0, p1, p2) + yvals_err

def fit_leastsq(p0, datax, datay, function, bounds1):

    errfunc = lambda p, x, y: function(x,p) - y

    all_results= optimize.least_squares(errfunc, p0, args=(datax, datay),bounds=bounds1 )
    pfit=all_results.x
    J=final_results.jac #the jacobian matrix
    pcov=np.linalg.inv(2*J.T.dot(J)) #This calculates the inverse Hessian which is used to approximate the covariance

    s_sq = (errfunc(pfit, datax, datay)**2).sum()/(len(datay)-len(p0))
    pcov = pcov * s_sq

    error = [] 
    for i in range(len(pfit)):
        try:
            error.append(np.absolute(pcov[i][i])**0.5)
        except:
            error.append( 0.00 )
    pfit_leastsq = pfit
    perr_leastsq = np.array(error) 
    return pfit_leastsq, perr_leastsq 

#Set Bounds on Parameters 
bounds=([0,0,0],[1,40,2.5]) #this means first parameter is restricted 0-1, second is 0-40, and this is 0-2.5

pfit, perr = fit_leastsq(pstart, xdata, ydata, ff, bounds)

print("\n# Fit parameters and parameter errors from lestsq method :")
print("pfit = ", pfit)
print("perr = ", perr)

The results I get are:

# Fit parameters and parameter errors from lestsq method :
pfit =  [ 1.         39.97612883  1.98430673]
perr =  [0.07545234 0.07611141 0.01546065]

Note that the perr on the parameters extends outside the range of the bounds. Perhaps there is a way to get directional error? That is, for example, the errorbar on the first parameter will only be in the negative direction?

mikuszefski · Answer 1 · 2019-07-16T08:46:11.757

There are different ways of interpreting the error of such a fit. You are trying to think about it this way: If I have x_0 with error x_e there is a 68% chance that the real value is between x_0 - x_e and x_0 + x_e. The algorithm finding your x_0 does not know about this an so you could see the error in the following way: In a linear way the error is estimated from the curvature of your chi-square function at the minimum. So the error is just a measure of the "quality of the fit". Moreover, it is linearized. For large errors non-linear properties of your functions might come into play that are not captured by this ansatz, nor by standard error propagation.

From a pure physics point of view, however, if you want error bars to present a reasonable region of possible physically meaningful values, you should chop accordingly.

Why do scipy.optimize error bars on parameters extend outside bounds

1 Answers1