9

I have some which I am fitting to the gamma distribution using scipy.stats. I am able to extract the shape, loc, and scale params and they look reasonable with the data ranges I expect.

My question is: is there a way to also get the errors in the parameters? something like the output of curve_fit. NOTE: I don't use curve fit directly because it is not working properly and most of the time is not able to compute the parameters of the gamma distribution. On the other hand, scipy.stats.gamma.fit works ok.

This is an example (no my actual data) of what I am doing.

from scipy.stats import gamma
shape = 12; loc = 0.71; scale = 0.0166
data = gamma.rvs(shape, loc=loc, scale=scale, size=1000)
params = gamma.fit(data) # params close to but not the same as (shape, loc, scale) 
# HOW TO ESTIMATE/GET ERRORS FOR EACH PARAM?

Thanks in advance

iluvatar
  • 872
  • 10
  • 21

1 Answers1

10

edit Warning: The following illustrates the use of GenericLikelihoodModel following the example in the question. However, in the case of the gamma distribution the location parameter shifts the support of the distribution which is ruled out by the general assumptions for maximum likelihood estimation. The more standard usage would fix the support, floc=0, so it is just a two-parameter distribution. In that case, standard MLE theory applies.

Statsmodels has a generic class for maximum likelihood estimation, GenericLikelihoodModel. It's not directly designed for this case, but can be used with some help (defining attributes and providing start_params).

import numpy as np
from statsmodels.base.model import GenericLikelihoodModel

from scipy.stats import gamma
shape = 12; loc = 0.71; scale = 0.0166
data = gamma.rvs(shape, loc=loc, scale=scale, size=1000)
params = gamma.fit(data) # params close to but not the same as (shape, loc, scale) 
# HOW TO ESTIMATE/GET ERRORS FOR EACH PARAM?

print(params)
print('\n')


class Gamma(GenericLikelihoodModel):

    nparams = 3

    def loglike(self, params):
        return gamma.logpdf(self.endog, *params).sum()


res = Gamma(data).fit(start_params=params)
res.df_model = len(params)
res.df_resid = len(data) - len(params)
print(res.summary())

This prints the following

(10.31888758604304, 0.71645502437403186, 0.018447479022445423)


Optimization terminated successfully.
         Current function value: -1.439996
         Iterations: 69
         Function evaluations: 119
                                Gamma Results                                 
==============================================================================
Dep. Variable:                      y   Log-Likelihood:                 1440.0
Model:                          Gamma   AIC:                            -2872.
Method:            Maximum Likelihood   BIC:                            -2852.
Date:                Sun, 12 Jul 2015                                         
Time:                        04:00:05                                         
No. Observations:                1000                                         
Df Residuals:                     997                                         
Df Model:                           3                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
par0          10.3187      2.242      4.603      0.000         5.925    14.712
par1           0.7165      0.019     37.957      0.000         0.679     0.753
par2           0.0184      0.002      8.183      0.000         0.014     0.023
==============================================================================

Other results based on the maximum likelihood estimates are then also available, for example a z-test that the first parameter is 10 can be performed like either by specifying a restriction matrix or by using a string expression with the equality:

>>> res.t_test(([1, 0, 0], [10]))
<class 'statsmodels.stats.contrast.ContrastResults'>
                             Test for Constraints                             
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
c0            10.3187      2.242      0.142      0.887         5.925    14.712
==============================================================================

>>> res.t_test('par0=10')
<class 'statsmodels.stats.contrast.ContrastResults'>
                             Test for Constraints                             
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
c0            10.3187      2.242      0.142      0.887         5.925    14.712
==============================================================================
Josef
  • 21,998
  • 3
  • 54
  • 67
  • 1
    Thanks a lot, I was using the bootstrap method described in http://scipy-central.org/item/36/2/error-estimates-for-fit-parameters-resulting-from-least-squares-fits-using-bootstrap-resampling , but the answer you gave is seems more complete and allows me to apply more tests. Still, I would like to know why, with this procedure, I obtain (sometimes very) different results for the parameters after running several times the same script. Is this normal? For example, c0 changes betwenn 8 up to 20 (changing, of course, the std err). Thanks again – iluvatar Jul 12 '15 at 18:35
  • Note, I used the scipy fit results as starting parameters for the GenericLikelihoodModel. In general this might not have good default starting values, and in some cases the curvature of the objective function is "not nice". I don't know how well behaved the three parameter Gamma distribution is. Maybe "basinhopping" would provide a successful global minimum to the optimization. – Josef Jul 12 '15 at 19:03
  • Another general comment: My guess is that commonly we would set `loc` as fixed at zero to model positive valued data. Maximum Likelihood often or in general has problems estimating the support of a distribution. – Josef Jul 13 '15 at 09:06
  • Thanks for your comments. In my particular case I have loc different to zero (actually this is a very important parameter in my particular theoretical framework) so I cannot set it to zero. Still, I find your answer appropriate and I am selecting it as such. – iluvatar Jul 14 '15 at 02:36
  • What if I would need the parameter covariance matrix? rather than just the standard errors.. – Marco Mene Apr 12 '16 at 12:28
  • @MarcoMene The results instances returned by `fit` have a `cov_params` method that returns the covariance matrix of the parameter estimate, based on Hessian in MLE. – Josef Apr 12 '16 at 14:27
  • Hi , could you please give me some help at my question ?Thanks.http://stackoverflow.com/questions/44022955/why-are-the-parameters-of-weibull-not-unique-for-a-given-data – yanachen May 17 '17 at 11:04