5

I am trying to fit my data to a beta-binomial distribution and estimate the alpha and beta shape parameters. For this distribution, the prior is taken from a beta distribution. Python does not have a fit function for beta-binomial but it does for beta. The python beta fitting and R beta binomial fitting is close but systematically off.

R:

library("VGAM")
x = c(222,909,918,814,970,346,746,419,610,737,201,865,573,188,450,229,629,708,250,508)
y = c(2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17)
fit=vglm(cbind(y, x) ~ 1, betabinomialff, trace = TRUE)
Coef(fit)
   shape1    shape2 
  1.736093 26.870768

python:

import scipy.stats
import numpy as np
x = np.array([222,909,918,814,970,346,746,419,610,737,201,865,573,188,450,229,629,708,250,508], dtype=float)
y = np.array([2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17])
scipy.stats.beta.fit((y)/(x+y), floc=0, fscale=1)
    (1.5806623978910086, 24.031893492546242, 0, 1)

I have done this many times and it seems like python is systematically a little bit lower than the R results. I was wondering if this is an input error on my part or just a difference in the way they are calculated?

user3266890
  • 465
  • 5
  • 15

2 Answers2

6

Your problem is that fitting a beta-binomial model is just not the same as fitting a Beta model with the values equal to the ratios. I'm going to illustrate here with the bbmle package, which will fit similar models to VGAM (but with which I'm more familiar).

Preliminaries:

library("VGAM")  ## for dbetabinom.ab
x <- c(222,909,918,814,970,346,746,419,610,737,
       201,865,573,188,450,229,629,708,250,508)
y <- c(2,18,45,11,41,38,22,7,40,24,34,21,49,35,31,44,20,28,39,17)

library("bbmle")

Fit beta-binomial model:

mle2(y~dbetabinom.ab(size=x+y,shape1,shape2),
     data=data.frame(x,y),
     start=list(shape1=2,shape2=30))
## Coefficients:
##    shape1    shape2 
##  1.736046 26.871526 

This agrees more or less perfectly with the VGAM result you quote.

Now use the same framework to fit a Beta model instead:

mle2(y/(x+y) ~ dbeta(shape1,shape2),
     data=data.frame(x,y),
     start=list(shape1=2,shape2=30))
## Coefficients:
##    shape1    shape2 
## 1.582021 24.060570 

This fits your Python, beta-fit result. (I'm sure if you used VGAM to fit the Beta you'd get the same answer too.)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    In case anyone is uncertain who wrote the `bbmle` package (and why Ben is more familiar with it), they have only to type `maintainer('bbmle')` in an R console session. – IRTFM Feb 10 '15 at 02:22
  • 1
    Then, I think I know what 'bb' stands for in `bbmle`! – Marat Talipov Feb 10 '15 at 03:44
  • I probably would have named it differently if I'd known it was going to be useful/popular in the long run ... – Ben Bolker Feb 10 '15 at 03:50
0

You can use the conjugate_prior package for python

See code for a coin-flip example:

from conjugate_prior import BetaBinomial
heads = 95
tails = 105
prior_model = BetaBinomial() #Uninformative prior
updated_model = prior_model.update(heads, tails)
credible_interval = updated_model.posterior(0.45, 0.55)
print ("There's {p:.2f}% chance that the coin is fair".format(p=credible_interval*100))
predictive = updated_model.predict(50, 50)
print ("The chance of flipping 50 Heads and 50 Tails in 100 trials is {p:.2f}%".format(p=predictive*100))

Code taken from here

Uri Goren
  • 13,386
  • 6
  • 58
  • 110