1

A related question is "using fitdist from fitdistplus with binomial distribution ". fitdistrplus::fitdist is a function that takes univariate data and starting guess for parameters. To fit binomial and betabinomial data, while univariate, the size is also needed. If the size is fixed for every datum, then the aforementioned link has the fix needed. However if the sizes vary and a vector needs to be passed, I'm unsure how to get a properly functioning call.

opt_one in the code below was the solution that was offered in the aforementioned linked post -- that is, the cluster size is known and fixed. For opt_one, I incorrectly specify fix.arg=list(size=125) (in essence making every element of N=125) and this is close enough and the code runs. However, the cluster sizes in N actually vary. I try to specify this in opt_two and get an error. Any thoughts would be appreciated.

library(fitdistrplus)
library(VGAM)
set.seed(123)

N <- 100 + rbinom(1000,25,0.9)

Y <- rbetabinom.ab(rep(1,length(N)), N, 1, 2)

head(cbind(Y,N))

opt_one <-
  fitdist(data=Y,
          distr=pbetabinom.ab,
          fix.arg=list(size=125),
          start=list(shape1=1,shape2=1)
  )
opt_one

Which gives:

> head(cbind(Y,N))
      Y   N
[1,] 67 123
[2,] 14 121
[3,] 15 123
[4,] 42 121
[5,] 86 120
[6,] 28 125
> opt_one <-
+   fitdist(data=Y,
+           distr=pbetabinom.ab,
+           fix.arg=list(size=125),
+           start=list(shape1=1,shape2=1)
+   )
Warning messages:
1: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125),  :
  The dbetabinom.ab function should return a zero-length vector when input has length zero
2: In fitdist(data = Y, distr = pbetabinom.ab, fix.arg = list(size = 125),  :
  The pbetabinom.ab function should return a zero-length vector when input has length zero
> opt_one
Fitting of the distribution ' betabinom.ab ' by maximum likelihood 
Parameters:
        estimate Std. Error
shape1 0.9694054 0.04132912
shape2 2.1337839 0.10108720
Fixed parameters:
     value
size   125

Not, bad, as the shape1 and shape2 parameters were 1 and 2, respectively, as specified when we created Y. Here's option 2:

opt_two <-
  fitdist(data=Y,
          distr=pbetabinom.ab,
          fix.arg=list(size=N),
          start=list(shape1=1,shape2=1)
  )

Which gives an error:

> opt_two <-
+   fitdist(data=Y,
+           distr=pbetabinom.ab,
+           fix.arg=list(size=N),
+           start=list(shape1=1,shape2=1)
+   )
Error in checkparamlist(arg_startfix$start.arg, arg_startfix$fix.arg,  : 
  'fix.arg' must specify names which are arguments to 'distr'.

An Attempt after initial posting (thanks to Dean Follmann)

I know I can code my own betabinomial likelihood (opt_three, presented below), but would really like to use the tools provided with having a fitdist object -- that is, to have opt_two working.

library(Rfast)
loglik <-function(parm){  
  A<-parm[1];B<-parm[2]
  -sum( Lgamma(A+B) - Lgamma(A)- Lgamma(B) + Lgamma(Y+A) + Lgamma(N-Y+B) - Lgamma(N+A+B)  )
}

opt_three <- optim(c(1,1),loglik, method = "L-BFGS-B", lower=c(0,0))
opt_three

Which gives:

> opt_three
$par
[1] 0.9525161 2.0262342

$value
[1] 61805.54

$counts
function gradient 
       7        7 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Also related is Ben Bolker's answer using mle2. The fitdist solution remains at large.

swihart
  • 2,648
  • 2
  • 18
  • 42

1 Answers1

0

Look at example 4 of the ?fitdistrplus::fitdist() help page:

# (4) defining your own distribution functions, here for the Gumbel distribution
# for other distributions, see the CRAN task view 
# dedicated to probability distributions
#

dgumbel <- function(x, a, b) 1/b*exp((a-x)/b)*exp(-exp((a-x)/b))
pgumbel <- function(q, a, b) exp(-exp((a-q)/b))
qgumbel <- function(p, a, b) a-b*log(-log(p))

fitgumbel <- fitdist(serving, "gumbel", start=list(a=10, b=10))
summary(fitgumbel)
plot(fitgumbel)

and then -- feeling inspired and informed because you actually RTM -- make your own [dpq] functions with N specified:

dbbspecifiedsize <- function(x, a, b) dbetabinom.ab(x, size=N, shape1=a, shape2=b)
pbbspecifiedsize <- function(q, a, b) pbetabinom.ab(q, size=N, shape1=a, shape2=b)
qbbspecifiedsize <- function(p, a, b) qbetabinom.ab(p, size=N, shape1=a, shape2=b)

opt_four <-
  fitdist(data=Y,
          distr="bbspecifiedsize",
          start=list(a=1,b=1)
  )
opt_four

which gives:

> opt_four
Fitting of the distribution ' bbspecifiedsize ' by maximum likelihood 
Parameters:
   estimate Std. Error
a 0.9526875 0.04058396
b 2.0261339 0.09576709

which is quite similar to the estimates of opt_three and is a fitdist object.

swihart
  • 2,648
  • 2
  • 18
  • 42