-1
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

observed = [0.294, 0.2955, 0.235, 0.2536, 0.2423, 0.2844, 0.2099, 0.2355, 0.2946, 0.3388, 0.2202, 0.2523, 0.2209, 0.2707, 0.1885, 0.2414, 0.2846, 0.328, 0.2265, 0.2563, 0.2345, 0.2845, 0.1787, 0.2392, 0.2777, 0.3076, 0.2108, 0.2477, 0.234, 0.2696, 0.1839, 0.2344, 0.2872, 0.3224, 0.2152, 0.2593, 0.2295, 0.2702, 0.1876, 0.2331, 0.2809, 0.3316, 0.2099, 0.2814, 0.2174, 0.2516, 0.2029, 0.2282, 0.2697, 0.3424, 0.2259, 0.2626, 0.2187, 0.2502, 0.2161, 0.2194, 0.2628, 0.3296, 0.2323, 0.2557, 0.2215, 0.2383, 0.2166, 0.2315, 0.2757, 0.3163, 0.2311, 0.2479, 0.2199, 0.2418, 0.1938, 0.2394, 0.2718, 0.3297, 0.2346, 0.2523, 0.2262, 0.2481, 0.2118, 0.241, 0.271, 0.3525, 0.2323, 0.2513, 0.2313, 0.2476, 0.232, 0.2295, 0.2645, 0.3386, 0.2334, 0.2631, 0.226, 0.2603, 0.2334, 0.2375, 0.2744, 0.3491, 0.2052, 0.2473, 0.228, 0.2448, 0.2189, 0.2149]
a, b, loc, scale = stats.beta.fit(observed,floc=0,fscale=1)
ax = plt.subplot(111)
ax.hist(observed, alpha=0.75, color='green', bins=104, density=True)
ax.plot(np.linspace(0, 1, 100), stats.beta.pdf(np.linspace(0, 1, 100), a, b))
plt.show()

The α and β is out of whack (α=6.056697373013153,β=409078.57804704335) The fitting image is also unreasonable. Histograms and beta distributions differ in height on the Y-axis.

The data of average is about 0.25, but calculated according to the expected value of beta distribution, 6.05/(6.05+409078.57)=1.47891162469e-05.This seems counterintuitive.

enter image description here

norok2
  • 25,683
  • 4
  • 73
  • 99
abraxas
  • 45
  • 9
  • 6
    Hi! Welcome to StackOverflow! Right now it is not possible to reproduce what you observe because there is insufficient code / information to run what you posted. This makes it unnecessary difficult to reproduce your error. I would suggest you to modify your question to include a [minimal example](https://stackoverflow.com/help/mcve). – norok2 Feb 04 '19 at 08:43
  • 4
    Please include `import`, a minimal trial dataset (input), what is your expected output. Clearly state your question. Building the [mcve] is a really important step to the solution. – jlandercy Feb 04 '19 at 08:59
  • @norok2 Our Dear Doc. full of mysterious fire。Sorry,I'm Chinese,English is not my forte,please forgive me。Now I have edited the questions – abraxas Feb 04 '19 at 09:44
  • No worries about the language, we are not all native speakers. Is `observed` your normed bins and modalities ranging (0, 1)? Because your histogram is far from a Beta distribution. – jlandercy Feb 04 '19 at 09:49
  • @jlandercy Our Dear jlandercy,I added a picture to the question, hoping it would help。What does "Is observed your normed bins and modalities ranging (0, 1) " mean?I don't quite understand – abraxas Feb 04 '19 at 09:58
  • @Goyo Sorry, this is my first time using stackoverflow, I corrected the code.My question is, it looks like the fitting diagram is a little bit different from the histogram, did I do something wrong? – abraxas Feb 04 '19 at 10:38
  • 1
    That is to be expected, specially if you use as many bins as the size of the sample in your histogram. Besides, forcing `loc` and `scale` to be `0` and `1` might not be the best thing to do but that you should know better than me. – Stop harming Monica Feb 04 '19 at 11:13

1 Answers1

0

I think you are messing up a bit the code with whatever your observation is. The main point to consider is that your beta fit will have both a and b, as well as loc and scale.

If you perform your fit using fixed loc/scale, i.e. scipy.stats.beta.fit(observed, floc=0, fscale=1), then your fitted a and b are: a = 33.26401059422594 and b = 99.0180817184922.

On the other hand, if you perform your fit with variable loc and scale, i.e. scipy.stats.beta.fit(observed), then you must compute / consider scipy.stats.beta.pdf() to include also those as parameter, which are, with your data, a = 6.056697380819225, b = 409078.5780469263, loc = 0.15710752697400227, scale = 6373.831662619217.

According to its documentation, the probability density above is defined in the “standardized” form. To shift and/or scale the distribution use the loc and scale parameters. Specifically, beta.pdf(x, a, b, loc, scale) is identically equivalent to beta.pdf(y, a, b) / scale with y = (x - loc) / scale.

Hence, the theoretical mean/average should be modified accordingly to include the scaling and location transformations.

norok2
  • 25,683
  • 4
  • 73
  • 99
  • The histogram is far from the fitting diagram in the picture. Is there an error in operation or is the fitting degree bad? – abraxas Feb 05 '19 at 03:16
  • That is because of the binning you choose does not help you in visualizing your data. With `104` bins, there will be several bins with count `0`, and these are rightfully included in the fit but much less so in your visual inspection. If you choose a bin count of say `16`, then you have a much better visual match. Anyway, the *fitting degree* could be estimated https://stackoverflow.com/questions/24371051/how-to-perform-a-chi-squared-goodness-of-fit-test-using-scientific-libraries-in – norok2 Feb 05 '19 at 06:28