I have been looking for a way to fit data to a beta binomial distribution and estimate alpha and beta, similar to the way the vglm package in VGAM library does. I have not been able to find how to do this in python. There is a scipy.stats.beta.fit() , but nothing for a beta binomial distribution. Is there a way to do this?
2 Answers
I have not seen estimation for beta-binomial in Python.
If you just want to estimate the parameters, then you can use scipy.optimize to minimize the log-likelihood function which you can write yourself or copy code after a internet search.
You can subclass rv_discrete
in order to use the framework of scipy.stats.distributions, but discrete distributions in scipy do not have a fit
method.
If you want to use statsmodels, then you could subclass GenericLikelihoodModel
http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModel.html which uses scipy.optimize but defines most of the things we need for Maximum Likelihood estimations.
However, you need to write the code for the log-likelihood function. This would provide the usual maximum likelihood results such as standard errors for the parameters and various tests.
If you need beta-binomial regression, then the mean variance parameterization as used in the R package gamlss would be more common, and can reuse the link
functions to constrain the parameters to be in the valid domain.
As a related example: This is the gist with the GenericLikelihoodModel prototype that lead to a pull request for Beta-Regression for statsmodels: http://gist.github.com/brentp/089c7d6d69d78d26437f

- 21,998
- 3
- 54
- 67
-
1scipy has scipy.stats.beta.fit(). What would be the difference between the alpha and beta parameters of this and something built for the beta binomial? – user3266890 Feb 09 '15 at 17:05
-
I never tried, so I'm basing it on comparing the wikipedia description of beta and beta-binomial. I think, if `n` is constant, then the alpha and beta estimates should be the same. If `n` fluctuates, then estimating beta-binomial would have a different weighting of the observations than estimating beta, where the weights will depend on the size of n. A proportion based on a larger n would have a smaller variance and would have more weight than a proportion based on a smaller n. – Josef Feb 10 '15 at 04:18
-
The beta distribution (implemented in scipy.stats.beta) has support in the range [0,1], while the beta-binomial distribution has support in the integers. They are two completely different distributions for two completely different types of data; you cannot fit beta-binomial data using a beta distribution. (Think of the beta-binomial distribution as a binomial distribution in which you don't precisely know the success probability.) – Danny Sep 29 '15 at 11:27
-
I mentioned beta-regression as an example of implementing a maximum likelihood model. However, if we divide (beta-)binomial count data by the exposure (number of trials), then we get a proportion that we could fit with a beta distribution assuming it is a good approximation to the underlying distribution of the proportions. – Josef Sep 29 '15 at 12:26
This python module provides that https://github.com/lfiaschi/fastbetabino

- 3,145
- 7
- 31
- 44