6

I have been looking for a way to fit data to a beta binomial distribution and estimate alpha and beta, similar to the way the vglm package in VGAM library does. I have not been able to find how to do this in python. There is a scipy.stats.beta.fit() , but nothing for a beta binomial distribution. Is there a way to do this?

user3266890
  • 465
  • 5
  • 15

2 Answers2

5

I have not seen estimation for beta-binomial in Python.

If you just want to estimate the parameters, then you can use scipy.optimize to minimize the log-likelihood function which you can write yourself or copy code after a internet search.

You can subclass rv_discrete in order to use the framework of scipy.stats.distributions, but discrete distributions in scipy do not have a fit method.

If you want to use statsmodels, then you could subclass GenericLikelihoodModel http://statsmodels.sourceforge.net/devel/dev/generated/statsmodels.base.model.GenericLikelihoodModel.html which uses scipy.optimize but defines most of the things we need for Maximum Likelihood estimations. However, you need to write the code for the log-likelihood function. This would provide the usual maximum likelihood results such as standard errors for the parameters and various tests.

If you need beta-binomial regression, then the mean variance parameterization as used in the R package gamlss would be more common, and can reuse the link functions to constrain the parameters to be in the valid domain.

As a related example: This is the gist with the GenericLikelihoodModel prototype that lead to a pull request for Beta-Regression for statsmodels: http://gist.github.com/brentp/089c7d6d69d78d26437f

Josef
  • 21,998
  • 3
  • 54
  • 67
  • 1
    scipy has scipy.stats.beta.fit(). What would be the difference between the alpha and beta parameters of this and something built for the beta binomial? – user3266890 Feb 09 '15 at 17:05
  • I never tried, so I'm basing it on comparing the wikipedia description of beta and beta-binomial. I think, if `n` is constant, then the alpha and beta estimates should be the same. If `n` fluctuates, then estimating beta-binomial would have a different weighting of the observations than estimating beta, where the weights will depend on the size of n. A proportion based on a larger n would have a smaller variance and would have more weight than a proportion based on a smaller n. – Josef Feb 10 '15 at 04:18
  • The beta distribution (implemented in scipy.stats.beta) has support in the range [0,1], while the beta-binomial distribution has support in the integers. They are two completely different distributions for two completely different types of data; you cannot fit beta-binomial data using a beta distribution. (Think of the beta-binomial distribution as a binomial distribution in which you don't precisely know the success probability.) – Danny Sep 29 '15 at 11:27
  • I mentioned beta-regression as an example of implementing a maximum likelihood model. However, if we divide (beta-)binomial count data by the exposure (number of trials), then we get a proportion that we could fit with a beta distribution assuming it is a good approximation to the underlying distribution of the proportions. – Josef Sep 29 '15 at 12:26
2

This python module provides that https://github.com/lfiaschi/fastbetabino

Luca Fiaschi
  • 3,145
  • 7
  • 31
  • 44