I would like to estimate the alpha
and beta
parameters from a beta distribution
when all I have is a set of probabilities. I cannot seem to find an example of this online. Below I try three approaches: the betareg
package, Ben Bolker's bbmle
package and optim
in R
.
The betareg
package returns phi
rather than alpha
and beta
. It also returns a warning
. I think the bbmle
package does what I want, although I seem to have to jerryrig a data.frame
to obtain the estimates. My optim
function is not returning good estimates and I do not know how to correct it.
I am hoping someone can verify that what I am doing with the bbmle
package is okay and maybe show me how to correct my optim
function. Below I create a data set of probabilities and show my R
code for all three approaches.
Create a data set of probabilities:
set.seed(1234)
# generate data
n.samples <- 1000
alpha <- 2
beta <- 2
my.probs <- rbeta(n.samples, alpha, beta)
mean(my.probs)
#[1] 0.4925867
Here is my R
code using the betareg
package. I am aware that µ = p/(p + q)
and φ = p + q
and I think p = alpha
and q = beta
. I am not sure how to obtain estimates of alpha
and beta
from an estimate of phi. But given I have two equations with two unknowns I can probably define p = φ - q
and substitute that into the equation for µ
.
# install.packages('betareg')
library(betareg)
my.model <- betareg(formula = my.probs ~ 1)
summary(my.model)
# Call:
# betareg(formula = my.probs ~ 1)
#
# Standardized weighted residuals 2:
# Min 1Q Median 3Q Max
# -3.8992 -0.6445 -0.0096 0.6284 3.1924
#
# Coefficients (mean model with logit link):
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) -0.02684 0.02740 -0.98 0.327
#
# Phi coefficients (precision model with identity link):
# Estimate Std. Error z value Pr(>|z|)
# (phi) 4.1774 0.1686 24.77 <2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Type of estimator: ML (maximum likelihood)
# Log-likelihood: 138.9 on 2 Df
# Number of iterations: 10 (BFGS) + 2 (Fisher scoring)
# Warning message:
# In deparse(x$call, width.cutoff = floor(getOption("width") * 0.85)) :
# invalid 'cutoff' value for 'deparse', using default
exp(-0.02684) / (1 + exp(-0.02684))
#[1] 0.4932904
Here is my R
code for the bbmle
package. I define a placeholder data.frame
of x
and y
where each is always 1
. These seems to work, but I am not sure it is okay to do. I get warnings
.
alpha and beta estimates for beta binomial and beta distributions
# install.packages("bbmle")
library("bbmle")
x <- rep(1, length(my.probs))
y <- rep(1, length(my.probs))
my.model <- mle2(minuslogl = my.probs ~ dbeta(shape1, shape2), start = list(shape1 = 2, shape2 = 30), data = data.frame(x, y))
# There were 12 warnings (use warnings() to see them)
summary(my.model)
# Maximum likelihood estimation
#
# Call:
# mle2(minuslogl = my.probs ~ dbeta(shape1, shape2), start = list(shape1 = 2,
# shape2 = 30), data = data.frame(x, y))
#
# Coefficients:
# Estimate Std. Error z value Pr(z)
# shape1 2.060660 0.087702 23.496 < 2.2e-16 ***
# shape2 2.116721 0.090365 23.424 < 2.2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# -2 log L: -277.8204
Here is my optim
function. I admit I am kind of guessing here as I have no experience with beta
estimation. The estimates do not appear to be good and I get warnings
.
beta.reg = function(betas, my.probs){
b0 = betas[1]
alpha = betas[2]
beta = betas[3]
n <- length(my.probs)
llh <- rep(0, n)
for(i in 1:n){
r = my.probs[i] - b0
llh[i] = llh[i] + r^2
}
-sum(dbeta(llh, alpha, beta, log = TRUE))
}
alpha.int <- alpha+1
beta.int <- beta-1
my.model <- optim(c(mean(my.probs), alpha.int, beta.int),
beta.reg, my.probs = my.probs, method = "BFGS", hessian=TRUE)
# There were 41 warnings (use warnings() to see them)
my.model$par
#[1] 0.4901505 0.5561770 11.0689391
mean(my.probs)
my.model$value
#[1] -2170.605
covmat <- solve(my.model$hessian)
mySE <- diag(covmat)^0.5
mySE
#[1] 0.0007311783 0.0207619704 0.6066361917
Thank you for any help with this.