1

I'm trying to fit data to probability distribution (gamma function in my case).

With the method of moments I achieved some success:

mean, var = data.mean(), data.var()
α, β = mean ** 2 / var, var / mean

x = np.linspace(0, 100)

plt.plot(x, gamma.pdf(x, α, 0, β))
# Pandas is in use
data.plot(kind='hist', xlim=(0, 100), bins=500, normed=True, color='lightblue')
data.dropna().plot(kind='kde', xlim=(0, 100), style='r--')

Gamma test 01

However this is not satisfying so I decided to use scipy's method fit:

args = gamma.fit(data)

x = np.linspace(0, 100)

plt.plot(x, gamma.pdf(x, *args))
data.plot(kind='hist', xlim=(0, 100), bins=500, normed=True, color='lightblue')
data.dropna().plot(kind='kde', xlim=(0, 100), style='r--')

But the only result I achieved was this distribution:

Gamma test 02

Could someone describe me what I'm doing wrong? I thought that gamma.fit should do at least as good as method of moments.

UPD:

gamma.fit returns this: (0.00077655597754514266, -6.0499462017751218e-25, 3.6337557495676194)

data.describe() shows that data is ok:

count    5546.000000
mean       45.601515
std        28.563211
min         0.000000
25%        35.000000
50%        42.000000
75%        52.000000
max      1488.000000
dtype: float64

Trying to filtering data with data[[data > 0]] force location to 0

args = gamma.fit(list(data[[data > 0]]), floc=0)
args

(7.897834117836718, 0, 5.7749678991293987)

Gamma test 03

gives better results. Seems like it's working now. Thanks!

Zelta
  • 754
  • 5
  • 14

0 Answers0