I'm trying to fit data to probability distribution (gamma function in my case).
With the method of moments I achieved some success:
mean, var = data.mean(), data.var()
α, β = mean ** 2 / var, var / mean
x = np.linspace(0, 100)
plt.plot(x, gamma.pdf(x, α, 0, β))
# Pandas is in use
data.plot(kind='hist', xlim=(0, 100), bins=500, normed=True, color='lightblue')
data.dropna().plot(kind='kde', xlim=(0, 100), style='r--')
However this is not satisfying so I decided to use scipy's method fit
:
args = gamma.fit(data)
x = np.linspace(0, 100)
plt.plot(x, gamma.pdf(x, *args))
data.plot(kind='hist', xlim=(0, 100), bins=500, normed=True, color='lightblue')
data.dropna().plot(kind='kde', xlim=(0, 100), style='r--')
But the only result I achieved was this distribution:
Could someone describe me what I'm doing wrong?
I thought that gamma.fit
should do at least as good as method of moments.
UPD:
gamma.fit
returns this: (0.00077655597754514266, -6.0499462017751218e-25, 3.6337557495676194)
data.describe()
shows that data is ok:
count 5546.000000
mean 45.601515
std 28.563211
min 0.000000
25% 35.000000
50% 42.000000
75% 52.000000
max 1488.000000
dtype: float64
Trying to filtering data with data[[data > 0]]
force location to 0
args = gamma.fit(list(data[[data > 0]]), floc=0)
args
(7.897834117836718, 0, 5.7749678991293987)
gives better results. Seems like it's working now. Thanks!