Math
First we start with linear regression. In many statistical problems, we assume the variables have some underlying distributions with some unknown parameters and we estimate these parameters. In linear regression, we assume the dependent variables yi have a linear relationship with the independent variables xij:
yi = xi1β1 + ... + xipβp + σεi, i = 1, ..., n.
where εi has independent standard normal distribution, βj's are p unknown parameters and σ is also unknown. We can write this in matrix form:
Y = X β + σε,
where Y, β, and ε is a column vector. To find the best β, we minimize the sum of the squares
S = (Y - X β)T (Y - X β).
I just write out the solution, which is
β^ = (XT X)-1 XT Y.
If we see Y as the specific observed data, β^ is the estimation of β under that observation. On the other hand, if we see Y as the random variable, the estimator β^ becomes a random variable too. In this way, we can see what the covariance of β^ is.
Because Y has a multivariate normal distribution and β^ is a linear transformation of Y, β^ has also a multivariate normal distribution. The covariance matrix of β^ is
Cov(β^) = (XT X)-1 XT Cov(Y) ((XT X)-1 XT)T = (X T X)-1 σ2.
But here σ is unknown, so we also need to estimate it. If we let
Q = (Y - X β^)T (Y - X β^),
it can be proved that Q / σ2 has the chi-square distribution with n - p degrees of freedom (moreover, Q is independent of β^). This makes
σ^2 = Q / (n - p)
an unbiased estimator of σ2. So the final estimator of Cov(β^) is
(XT X)-1 Q / (n - p).
SciPy API
curve_fit
is the most convenient, the second return value pcov
is just the estimation of the covariance of β^, that is the final result (XT X)-1 Q / (n - p) above.
In leastsq
, the second return value cov_x
is (XT X)-1. From the expression of S, we see XT X is the Hessian of S (half of the Hessian, to be precise), that's why the document says cov_x
is the inverse of the Hessian. To get the covariance, you need to multiply cov_x
with Q / (n - p).
Non-Linear Regression
In non-linear regression, yi depend on the parameters non-linearly:
yi = f(xi, β1, ... , βp) + σεi.
We can calculate the partial derivatives of f with respect to βj, so it becomes approximately linear. Then the calculation is basically the same as linear regression except we need to approximate the minimum iteratively. In practice, the algorithm can be some more sophisticated one such as the Levenberg–Marquardt algorithm which is the default of curve_fit
.
More About Providing Sigma
This section is about the sigma
and absolute_sigma
parameter in curve_fit
. For basic usage of curve_fit
when you have no prior knowledge about the covariance of Y, you can ignore this section.
Absolute Sigma
In linear regression above, the variance of yi is σ and is unknown. If you know the variance. You can provide it to curve_fit
through the sigma
parameter and set absolute_sigma=True
.
Suppose your provided sigma
matrix is Σ. i.e.
Y ~ N(X β, Σ).
Y has the multivariate normal distribution with mean X β and covariance Σ. We want to maximize the likelihood of Y. From probability density function of Y, that is equivalent to minimize
S = (Y - X β)T Σ-1 (Y - X β).
The solution is
β^ = (XT Σ-1 X)-1 XT Σ-1 Y.
And
Cov(β^) = (XT Σ-1 X)-1.
The β^ and Cov(β^) above are the return values of curve_fit
with absolute_sigma=True
.
Relative Sigma
In some cases, you don't know the exact variance of yi, but you know the relative relationship between different yi, for example the variance of y2 is 4 times the variance of y1. Then you can pass sigma
and set absolute_sigma=False
.
This time
Y ~ N(X β, Σσ)
with a known matrix Σ provided and an unknown number σ. The objective function to minimize is the same as absolute sigma since σ is a constant, and thus the estimator β^ is the same. But the covariance
Cov(β^) = (XT Σ-1 X)-1 σ2,
has the unknown σ in it. To estimate σ, let
Q = (Y - X β^)T Σ-1 (Y - X β^).
Again, Q / σ2 has the chi-square distribution with n - p degrees of freedom.
The estimation of Cov(β^) is
(XT Σ-1 X)-1 Q / (n - p).
And this is the second return value of curve_fit
with absolute_sigma=False
.