9

I use scipy.odr in order to make a fit with uncertainties on both x and y following this question Correct fitting with scipy curve_fit including errors in x?

After the fit I would like to compute the uncertainties on the parameters. Thus I look at the square root of the diagonal elements of the covariance matrix. I get :

>>> print(np.sqrt(np.diag(output.cov_beta)))
[ 0.17516591  0.33020487  0.27856021]

But in the Output there is also output.sd_beta which is, according to the doc on odr

Standard errors of the estimated parameters, of shape (p,).

But, it does not give me the same results :

>>> print(output.sd_beta)
[ 0.19705029  0.37145907  0.31336217]

EDIT

This is an example on a notebook : https://nbviewer.jupyter.org/github/gvallverdu/cookbook/blob/master/fit_odr.ipynb

With least square

stop reason: ['Sum of squares convergence']
        params: [ -1.94792946  11.03369235  -5.43265555]
          info: 1
       sd_beta: [ 0.26176284  0.49877962  0.35510071]
sqrt(diag(cov): [ 0.25066236  0.47762805  0.34004208]

With ODR

stop reason: ['Sum of squares convergence']
        params: [-1.93538595  6.141885   -3.80784384]
          info: 1
       sd_beta: [ 0.6941821   0.88909997  0.17292514]
sqrt(diag(cov): [ 0.01093697  0.01400794  0.00272447]
Community
  • 1
  • 1
Ger
  • 9,076
  • 10
  • 37
  • 48
  • Did ODR converge? What is `output.reason`? – ali_m Dec 07 '16 at 23:38
  • attribute `output.stopreason` is 'Iteration limit reached'. I did explicit odr with `fit_type=0`. The fit looks like good but maybe the number of iteration is not enough ! `output.info` is 4. – Ger Dec 07 '16 at 23:46
  • I don't think you can trust `cov_beta` and `sd_beta` if ODR hasn't converged on a solution. Try setting `maxiter` to something large, or check `output.stopreason` and call `output = odr.reset()` if ODR hasn't converged yet. – ali_m Dec 08 '16 at 00:21
  • Yes, without convergence you can talk nonsense. I run odr until it reaches convergence. I edit the question with the notebook example. With the least square approach `sd_beta` et `cov_beta` are closed each other. But with explicit ODR there are strong differences. – Ger Dec 08 '16 at 09:20
  • Interesting. It seems that sqrt(diag(sd_cov)) is a scalar multiple of sd_beta (in your example the scaling parameter is 0.01575519). – ali_m Dec 08 '16 at 20:28
  • Yes, I look at two other examples on more "real" data and again `sb_beta` and `sqrt(diag(sb_cov))` are proportional. But the coefficient is not always the same, I got 0.6981417 or 0.737385 – Ger Dec 08 '16 at 21:05

1 Answers1

12

The reason for the discrepancy is that sd_beta is scaled by the residual variance, whereas cov_beta isn't.

scipy.odr is an interface for the ODRPACK FORTRAN library, which is thinly wrapped in __odrpack.c. sd_beta and cov_beta are recovered by indexing into the work vector that's used internally by the FORTRAN routine. The indices of their first elements in work are variables named sd and vcv (see here).

From the ODRPACK documentation (p.85):

WORK(SDI) is the first element of a p × 1 array SD containing the standard deviations ̂σβK of the function parameters β, i.e., the square roots of the diagonal entries of the covariance matrix, where

WORK(SDI-1+K) = SD(K) = ̂V 1/2 β (K, K) = ̂σβK

for K = 1,... ,p.

WORK(VCVI) is the first element of a p × p array VCV containing the values of the covariance matrix of the parameters β prior to scaling by the residual variance, where

WORK(VCVI-1+I+(J-1)*(NP)) = VCV(I,J) = ̂σ⁻²V β(I, J)

for I = 1,... ,p and J = 1,... ,p.

In other words, np.sqrt(np.diag(output.cov_beta * output.res_var)) will give you the same result as output.sd_beta.

I've opened a bug report here.

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • Looking at the doc of matlab, the standard error (stdx) seems to be equivalent to the sd_beta of scipy.odr : https://fr.mathworks.com/help/matlab/ref/lscov.html Matlab does multiply by the error. – Ger Dec 09 '16 at 13:43
  • The line `np.sqrt(np.diag(output.cov_beta * output.res_var))` indeed returns the same value as `output.sd_beta`, but what's not clear to me is: which of these two outputs is the proper standard deviation for the fitted parameters? (Asked here: https://stackoverflow.com/questions/44638882/estimate-the-standard-deviation-of-fitted-parameters-in-scipy-odr) – Gabriel Jun 21 '17 at 19:22
  • ali_m shouldn't it be the other way around? I.e.: "*`cov_beta` is scaled by the residual variance, whereas `sd_beta` isn't*". – Gabriel Jun 22 '17 at 17:08
  • 2
    @Gabriel No, `output.cov_beta` *ought* to be scaled by the residual variance, but it isn't by default (hence why you need to multiply it by `output.res_var`). – ali_m Jun 22 '17 at 18:31