1

I have:

from scipy import stats
data = stats.gamma.rvs(2, loc=1.5, scale=2, size=100000)

so I do a fit on that

fitted_params = scipy.stats.gamma.fit(data)

how do I calculate the AIC from that? AIC = 2*k - 2*ln(L) where k is the number of parameters fitted and L is the maximum log likelihood function

k = len(fitted_params)
aic = 2*k - 2*(logLik)

logLik would be ?

I found this snippet:

logLik = -np.sum( stats.norm.logpdf(data, loc=yPred, scale=sd) ) 

from Maximum Likelihood Estimate

so is my function going to be:

# calc SD of fitted distribution
sd = std(loc=fitted_params[1], scale=fitted_params[2])

# sample values from fitted dist same length as original data array
yPred = rvs(fitted_params[0], loc=fitted_params[1], scale=fitted_params[2], size=len(data), random_state=None)

# calc the log likelihood 
logLik = -np.sum( stats.gamma.logpdf(data, loc=yPred, scale=sd) ) 
Community
  • 1
  • 1
KillerSnail
  • 3,321
  • 11
  • 46
  • 64

1 Answers1

7

The likelihood is really the probability of observing the data given the parameters. Hence, if you have some parameter values, i.e. your fitted values, then the likelihood is probability of the data, where the density is parameterized with the fitted values.

Hence, what you're doing is almost correct. Since you sampled from a gamma distribution, you should also calculate the likelihood using that distribution. I.e. instead of

logLik = -np.sum( stats.norm.logpdf(data, loc=yPred, scale=sd) ) 

do

logLik = np.sum( stats.gamma.logpdf(data, fitted_params[0], loc=fitted_params[1], scale=fitted_params[2]) ) 

Then you just use your AIC equation to get that.

bvilhjal
  • 136
  • 4
  • 1
    Guess the parameters passed to logpdf function can be simplified as stats.gamma.logpdf(data, *fitted_params) – Kevin Zhu Aug 13 '19 at 01:59
  • as @KevinZhu said, see [this](https://stackoverflow.com/questions/50588602/calculating-loglikelihood-of-distributions-in-python) answer for updated log likelihood – Perl Del Rey Jun 16 '20 at 16:37