2

I am fitting a beta distribution with beta.fit(W). The values of W do not reach the [0,1] boundaries. My question is the following - do I need to force [0,1] bounds by beta.fit(W,loc = min(W),scale = max(W) - min(W)), or may I assume that as long as the data is within the [0,1] range, the fitting "will be fine"? Obviously, scaling the data should give different values of a and b. Which one is the "correct one"?

This question is related to: https://stats.stackexchange.com/questions/68983/beta-distribution-fitting-in-scipy

Unfortunately, no valid answer on what to do when the data is within the expected range is give...

I tried to fit data generated with known values of a and b and neither technique gave a good fit, although scaling seemed to help a bit.

Thanks

Community
  • 1
  • 1
user3861925
  • 713
  • 2
  • 10
  • 24

1 Answers1

1

When not passing the floc and fscale parameters, fit tries to estimate them. If you know that the data are in a specific interval you should make that additional information known to the fit function (by setting the parameters yourself) in order to improve the fit. You can also give initial guesses for α, β and the scale parameters (via the loc and scale keyword arguments); SciPy's default guessing function seems to be quite sophisticated, though.

Deriving floc and fscale from the limits of the sample set is not a good idea because the beta distribution is zero at the interval boundaries for most values of α and β, which means that you are creating large discrepancies between the data and all possible fits.

Community
  • 1
  • 1
Roland W
  • 1,401
  • 14
  • 21
  • I was under the impression that scale and loc are to be used for scaling variables which are not in [0,1] interval and that while within the scale is always 1 and loc = 0, i.e., nothing to change in the input data. What you are saying is that the data is scaled somehow either way. Hence, if the given samples are all from a certain [alpha,beta] but do not span the entire [0,1], the estimation will be intrinsically incorrect. Am I right on that? Otherwise, I fail to understand scale values greater than 1 which are supported by the fit function as beta is defined on [0,1] only...Thanks – user3861925 Mar 07 '16 at 16:01
  • 2
    The beta distribution globally transforms its arguments to [0,1] by applying the transformation `y = (x - loc)/scale`. This is not limited to data outside the [0,1] interval. – Roland W Mar 07 '16 at 16:45
  • so it simply does not always work accurately then... when I provide elements in range, say, [0.05,0.5] I would expect a fit that does not have it's mean outside the given samples, say 0.6. Obviously the alg does not take this into account, with or without scaling... thanks again – user3861925 Mar 07 '16 at 16:55