6

Below are my codes:

library(fitdistrplus)

s <- c(11, 4, 2, 9, 3, 1, 2, 2, 3, 2, 2, 5, 8,3, 15, 3, 9, 22, 0, 4, 10, 1, 9, 10, 11,
 2, 8, 2, 6, 0, 15, 0 , 2, 11, 0, 6, 3, 5, 0, 7, 6, 0, 7, 1, 0, 6, 4, 1, 3, 5,
 2, 6, 0, 10, 6, 4, 1, 17, 0, 1, 0, 6, 6, 1, 5, 4, 8, 0, 1, 1, 5, 15, 14, 8, 1,
 3, 2, 9, 4, 4, 1, 2, 18, 0, 0, 10, 5, 0, 5, 0, 1, 2, 0, 5, 1, 1, 2, 3, 7)

o <- fitdist(s, "gamma", method = "mle")
summary(o)
plot(o)

and the error says:

Error in fitdist(s, "gamma", method = "mle") : the function mle failed to estimate the parameters, with the error code 100

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Chuan
  • 61
  • 1
  • 2
  • did the solution(s) below solve your problem? If so, please click on the checkmark to accept one or the other. If not, please let us know why not ... – Ben Bolker Sep 10 '18 at 18:54

2 Answers2

6

The Gamma distribution doesn't allow zero values (the likelihood will evaluate to zero, and the log-likelihood will be infinite, for a response of 0) unless the shape parameter is exactly 1.0 (i.e., an exponential distribution - see below) ... that's a statistical/mathematical problem, not a programming problem. You're going to have to find something sensible to do about the zero values. Depending on what makes sense for your application, you could (for example)

  • choose a different distribution to test (e.g. pick a censoring point and fit a censored Gamma, or fit a zero-inflated Gamma distribution, or ...)
  • exclude the zero values (fitdist(s[s>0], ...))
  • set the zero values to some sensible non-zero value (fitdist(replace(s,which(s==0),0.1),...)

which (if any) of these is best depends on your application.


@Sandipan Dey's first answer (leaving the zeros in the data set) appears to make sense, but in fact it gets stuck at the shape parameter equal to 1.

o <- fitdist(s, "exp", method = "mle")

gives the same answer as @Sandipan's code (except that it estimates rate=0.2161572, the inverse of the scale parameter=4.626262 that's estimated for the Gamma distribution - this is just a change in parameterization). If you choose to fit an exponential instead of a Gamma, that's fine - but you should do it on purpose, not by accident ...

To illustrate that the zeros-included fit may not be working as expected, I'll construct my own negative log-likelihood function and display the likelihood surface for each case.

mfun <- function(sh,sc,dd=s) {
  -sum(dgamma(dd,shape=sh,scale=sc,log=TRUE))
}

library(emdbook)  ## for curve3d() helper function

Zeros-included surface:

cc1 <- curve3d(mfun(x,y),
      ## set up "shape" limits" so we evaluate
      ## exactly shape=1.000 ...
        xlim=c(0.55,3.55),
        n=c(41,41),
        ylim=c(2,5),
        sys3d="none")


png("gammazero1.png")
with(cc1,image(x,y,z))
dev.off()

enter image description here

In this case the surface is only defined at shape=1 (i.e. an exponential distribution); the white regions represent infinite log-likelihoods. It's not that shape=1 is the best fit, it's that it's the only fit ...

Zeros-excluded surface:

cc2 <- curve3d(mfun(x,y,dd=s[s>0]),
               ## set up "shape" limits" so we evaluate
               ## exactly shape=1.000 ...
               xlim=c(0.55,3.55),
               n=c(41,41),
               ylim=c(2,5),
               sys3d="none")


png("gammazero2.png")
with(cc2,image(x,y,z))
with(cc2,contour(x,y,z,add=TRUE))
abline(v=1.0,lwd=2,lty=2)
dev.off()

enter image description here

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • @Bon Bolker by the statistical property of the gamma distribution it does not allow zero values, from https://tminka.github.io/papers/minka-gamma.pdf, the log-likelihood has a log term which is undefined for non-positive x, but i think `mledist` function which is used by `fitdist` that internally calls iterative optimization with `optim` must use log(0) = 0 or something like that, what do you think? – Sandipan Dey Aug 25 '18 at 21:30
  • I'm guessing from the fact that the shape parameter got stuck at 1.0 in your fit (which was your starting point) that optim found NA values for any other value of the shape parameter ... if your hypothesis is true then your `fitdist()` call should give the same results as if you excluded zero values from the data. – Ben Bolker Aug 25 '18 at 21:56
  • Great explanation – Sandipan Dey Aug 27 '18 at 06:02
  • Mr. Bolker I have one question about your recommendation. Excluding zero here from data set clearly works, but I am analyzing demand variable for a certain set of products and I need to show that some weeks the demand are at the lowest (aka zero). So what is your recommendation in this situation, can I just replace zeros with a low value like 0.001? – Anoushiravan R Oct 03 '21 at 20:40
  • 1
    My point is that it's complicated. See https://stats.stackexchange.com/questions/187824/how-to-model-non-negative-zero-inflated-continuous-data/187843#187843 – Ben Bolker Oct 03 '21 at 20:47
0

Just provide the initial values for the gamma distribution parameters (scale, shape) to be computed with mle using optim and also the lower bounds for the parameters, it should work.

o <- fitdist(s, "gamma", lower=c(0,0), start=list(scale=1,shape=1))
summary(o)

#Fitting of the distribution ' gamma ' by maximum likelihood 
#Parameters : 
#      estimate Std. Error
#scale 4.626262         NA
#shape 1.000000         NA
#Loglikelihood:  -250.6432   AIC:  505.2864   BIC:  510.4766 

enter image description here

As per the comments by @Ben Bolker, we may want to exclude the zero points first:

o <- fitdist(s[s!=0], "gamma", method = "mle", lower=c(0,0), start=list(scale=1,shape=1))
summary(o)
#Fitting of the distribution ' gamma ' by maximum likelihood 
#Parameters : 
#      estimate Std. Error
#scale 3.401208         NA
#shape 1.622378         NA
#Loglikelihood:  -219.6761   AIC:  443.3523   BIC:  448.19

enter image description here

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • 1
    again, I think it's worth pointing out that in the zeros-included case, this procedure is effectively fitting an **exponential distribution** (i.e. Gamma with shape parameter restricted to 1). – Ben Bolker Aug 26 '18 at 21:18
  • @Ben Bolker as you mentioned, it fits an exponential distribution (shape=1) which has lower log-likelihood, in the first case, the gamma distribution fit in the 2nd case has higher log-likelihood, hence it's a better fit. – Sandipan Dey Aug 26 '18 at 21:28
  • I'm downvoting this because the initial "solution" presented is potentially highly misleading. I would be happy to revert the downvote if the answer is edited to give a caveat at the beginning. – Ben Bolker Feb 18 '21 at 21:17