4

I am trying to fit Gamma CDF using scipy.stats.gamma but I do not know what exactly is the a parameter and how the location and scale parameters are calculated. Different literatures give different ways to calculate them and its very frustrating. I am using below code which is not giving correct CDF. Thanks in advance.

from scipy.stats import gamma 
loc = (np.mean(jan))**2/np.var(jan)
scale = np.var(jan)/np.mean(jan)
Jancdf  = gamma.cdf(jan,a,loc = loc, scale = scale)
StupidWolf
  • 45,075
  • 17
  • 40
  • 72

1 Answers1

4

a is the shape. What you have tried works only in the case where loc = 0. First we start with two examples, with shape (or a) = 10 and scale = 5, and the second d1plus50 differs from the first by 50, and you can see the shift which is dictated by loc:

from scipy.stats import gamma 
import matplotlib.pyplot as plt

d1 = gamma.rvs(a = 10, scale=5,size=1000,random_state=99)
plt.hist(d1,bins=50,label='loc=0,shape=10,scale=5',density=True)
d1plus50 = gamma.rvs(a = 10, loc= 50,scale=5,size=1000,random_state=99)
plt.hist(d1plus50,bins=50,label='loc=50,shape=10,scale=5',density=True)
plt.legend(loc='upper right')

enter image description here

So you have 3 parameters to estimate from the data, one way is use gamma.fit, we apply this on the simulated distribution with loc=0 :

xlin = np.linspace(0,160,50)

fit_shape, fit_loc, fit_scale=gamma.fit(d1)
print([fit_shape, fit_loc, fit_scale])

[11.135335235456457, -1.9431969603988053, 4.693776771991816]

plt.hist(d1,bins=50,label='loc=0,shape=10,scale=5',density=True)
plt.plot(xlin,gamma.pdf(xlin,a=fit_shape,loc = fit_loc, scale = fit_scale)

enter image description here

And if we do it for the distribution we simulated with loc, and you can see the loc is estimated correctly, as well as shape and scale:

fit_shape, fit_loc, fit_scale=gamma.fit(d1plus50)
print([fit_shape, fit_loc, fit_scale])

[11.135287555530564, 48.05688649976989, 4.693789434095116]

plt.hist(d1plus50,bins=50,label='loc=0,shape=10,scale=5',density=True)
plt.plot(xlin,gamma.pdf(xlin,a=fit_shape,loc = fit_loc, scale = fit_scale))

enter image description here

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thanks for quick response. It has almost resolved my issue. There is one small doubt. I thought that input array to fit gamma distribution (gamma.pdf(xlin,a=fit_shape,loc = fit_loc, scale = fit_scale)) should be d1 (data to be fitted) instead of xlin but that is not correct. Can you tell me little more to clear everything? – Vishal singh rajpoot Nov 09 '20 at 06:49
  • in your case, if you need the cdf, you would do ```gamma.cdf(jan,a=fit_shape,loc = fit_loc, scale = fit_scale)``` . In the example above, i have obtained an estimate and I am just plotting to show the fit – StupidWolf Nov 09 '20 at 09:00
  • Thank you very much. It took some time for me to get it. I regret that. This was a big issue for me. Thanks again. – Vishal singh rajpoot Nov 10 '20 at 03:53
  • you're welcome :) yeah the scale, shape and loc thing for gamma is a bit confusing at times – StupidWolf Nov 10 '20 at 20:38