A related question is "using fitdist from fitdistplus with binomial distribution
". fitdistrplus::fitdist
is a function that takes univariate data and starting guess for parameters. To fit binomial and betabinomial data, while univariate, the size is also needed. If the size is fixed for every datum, then the aforementioned link has the fix needed. However if the sizes vary and a vector needs to be passed, I'm unsure how to get a properly functioning call.
opt_one
in the code below was the solution that was offered in the aforementioned linked post -- that is, the cluster size is known and fixed. For opt_one
, I incorrectly specify fix.arg=list(size=125)
(in essence making every element of N=125) and this is close enough and the code runs. However, the cluster sizes in N
actually vary. I try to specify this in opt_two
and get an error. Any thoughts would be appreciated.
library(fitdistrplus)
library(VGAM)
set.seed(123)
N <- 100 + rbinom(1000,25,0.9)
Y <- rbetabinom.ab(rep(1,length(N)), N, 1, 2)
head(cbind(Y,N))
opt_one <-
fitdist(data=Y,
distr=pbetabinom.ab,
fix.arg=list(size=125),
start=list(shape1=1,shape2=1)
)
opt_one
Which gives:
> head(cbind(Y,N))
Y N
[1,] 67 123
[2,] 14 121
[3,] 15 123
[4,] 42 121
[5,] 86 120
[6,] 28 125
> opt_one <-
+ fitdist(data=Y,
+ distr=pbetabinom.ab,
+ fix.arg=list(size=125),
+ start=list(shape1=1,shape2=1)
+ )
Warning messages:
1: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125), :
The dbetabinom.ab function should return a zero-length vector when input has length zero
2: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125), :
The pbetabinom.ab function should return a zero-length vector when input has length zero
> opt_one
Fitting of the distribution ' betabinom.ab ' by maximum likelihood
Parameters:
estimate Std. Error
shape1 0.9694054 0.04132912
shape2 2.1337839 0.10108720
Fixed parameters:
value
size 125
Not, bad, as the shape1
and shape2
parameters were 1 and 2, respectively, as specified when we created Y
. Here's option 2:
opt_two <-
fitdist(data=Y,
distr=pbetabinom.ab,
fix.arg=list(size=N),
start=list(shape1=1,shape2=1)
)
Which gives an error:
> opt_two <-
+ fitdist(data=Y,
+ distr=pbetabinom.ab,
+ fix.arg=list(size=N),
+ start=list(shape1=1,shape2=1)
+ )
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg, :
'fix.arg' must specify names which are arguments to 'distr'.
An Attempt after initial posting (thanks to Dean Follmann)
I know I can code my own betabinomial likelihood (opt_three
, presented below), but would really like to use the tools provided with having a fitdist
object -- that is, to have opt_two
working.
library(Rfast)
loglik <-function(parm){
A<-parm[1];B<-parm[2]
-sum( Lgamma(A+B) - Lgamma(A)- Lgamma(B) + Lgamma(Y+A) + Lgamma(N-Y+B) - Lgamma(N+A+B) )
}
opt_three <- optim(c(1,1),loglik, method = "L-BFGS-B", lower=c(0,0))
opt_three
Which gives:
> opt_three
$par
[1] 0.9525161 2.0262342
$value
[1] 61805.54
$counts
function gradient
7 7
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
Also related is Ben Bolker's answer using mle2. The fitdist solution remains at large.