3

Following is a kind of data set I am working on it:

data <- c(0, 1, 0, 11, 2, 0, 3, 0, 0, 2, 1, 3, 1, 0, 1, 0, 0, 0, 2, 3, 
0, 0, 0, 8, 1, 1, 1, 0, 1, 1, 2, 7, 0, 0, 0, 5, 2, 3, 6, 1, 1, 
5, 2, 9, 0, 0, 1, 21, 16, 2, 9, 6, 25, 2, 1, 12, 16, 14, 15, 
15, 6, 1, 12, 12, 13, 5, 5, 6, 4, 7, 11, 8, 4, 5, 8, 3, 8, 4, 
7, 4, 7, 2, 5, 6, 4, 5, 1, 0, 8, 5, 6, 8, 9, 8, 9, 7, 7, 9, 8, 
9, 4, 4, 7, 13, 9, 13, 12, 10, 9, 8, 7, 11, 5, 5, 0, 1, 33, 4, 
22, 19, 22, 9, 5, 4, 17, 7, 7, 4, 5, 3, 0, 0, 9, 3, 0, 0, 36, 
40, 5, 4, 0, 11, 0, 7, 5, 25, 39, 26, 4, 20, 12, 4, 17, 3, 22, 
12, 14, 8, 9, 11, 7, 11, 10, 9, 16, 6, 24, 8, 5, 6, 14, 3, 9, 
4, 1, 20, 0, 1, 7, 9, 0, 12, 2, 29, 56, 16, 8, 28, 0, 19, 25, 
35, 87, 56, 66, 60, 58, 14, 10, 12, 13, 13, 34, 26, 18, 13, 22, 
13, 12, 15, 41, 11, 11, 11, 5, 6, 7, 8, 8, 17, 16, 12, 21, 38, 
34, 10, 77, 41, 7, 12, 1, 16, 20, 8, 5, 2, 20, 7, 16, 12, 6, 
10, 31, 12)

I have used the fitdistrplus package to fit distribution using maximum likelihood.
Similarly, gamlss package defines the pdf, CDF of zero Inflated Poisson distribution.

library(fitdistrplus)   
library(gamlss)  

Here is the mean and standard deviation of the data set.

mu=mean(data)
sigma=sd(data)

The mean of the data set is 10.75 and the standard deviation is 13.050. I have tried to fit the zero-inflated Poisson distribution using fitdist function as below.

fit_zip = fitdist(data, 'ZIP', start = list(mu = mu, sigma = sigma))

Once the stated function is compiled then the compiler throws the following error

Error in fitdist(data, "ZIP", start = list(mu = mu, sigma = sigma)) : 
  the function mle failed to estimate the parameters, 
                with the error code 100

It is obvious that there is a problem in the initialization of the starting value of mu and sigma. I am not sure on what basis starting values are initialized.
any help is appreciated.

Scransom
  • 3,175
  • 3
  • 31
  • 51
Nawa Raj
  • 111
  • 1
  • 7
  • 1) Study `help("zip")`. The parameter `sigma` isn't the standard deviation. 2) Recall that for the Poisson distribution E(x) == var(x). Then compare `mean(data[data != 0])` and `var(data[data != 0])`. Your data is unlikely to be from a zero-inflated Poisson distribution. 3) I don't think you can use `fitdist` with this distribution. First, `fitdist` makes the same mistake as you in assuming `sigma` is the standard deviation. That can be fixed. But then, when I use simulated data, I get errors because the `d/pZIP` functions don't behave like expected for incorrect input. – Roland Jun 01 '21 at 05:35

1 Answers1

0

You could use VGAM::*zipois() instead of gamlss.dist::*ZIP(). The starting parameters are a bit different, see ?dzipois, and don't necessarily have to be the mean of your input.

library('fitdistrplus')
library('VGAM')  
fitdist(x, 'zipois', start=list(lambda=1, pstr0=.1))
# $start.arg
# $start.arg$lambda
# [1] 1
# 
# $start.arg$pstr0
# [1] 0.1
# 
# 
# $fix.arg
# NULL
# 
# Fitting of the distribution ' zipois ' by maximum likelihood 
# Parameters:
#   estimate Std. Error
# lambda 12.2009631 0.23823650
# pstr0   0.1187056 0.02069464
# Warning messages:
#   1: In fitdist(x, "zipois", start = list(lambda = 1, pstr0 = 0.1)) :
#   The dzipois function should return a zero-length vector when input has length zero
# 2: In fitdist(x, "zipois", start = list(lambda = 1, pstr0 = 0.1)) :
#   The pzipois function should return a zero-length vector when input has length zero

I'm not sure about the warning, though.

You may compare this by using a different vector x1 (below) which won't fail with gamlss.dist.


Data:

x <- c(0, 1, 0, 11, 2, 0, 3, 0, 0, 2, 1, 3, 1, 0, 1, 0, 0, 0, 2, 3, 
0, 0, 0, 8, 1, 1, 1, 0, 1, 1, 2, 7, 0, 0, 0, 5, 2, 3, 6, 1, 1, 
5, 2, 9, 0, 0, 1, 21, 16, 2, 9, 6, 25, 2, 1, 12, 16, 14, 15, 
15, 6, 1, 12, 12, 13, 5, 5, 6, 4, 7, 11, 8, 4, 5, 8, 3, 8, 4, 
7, 4, 7, 2, 5, 6, 4, 5, 1, 0, 8, 5, 6, 8, 9, 8, 9, 7, 7, 9, 8, 
9, 4, 4, 7, 13, 9, 13, 12, 10, 9, 8, 7, 11, 5, 5, 0, 1, 33, 4, 
22, 19, 22, 9, 5, 4, 17, 7, 7, 4, 5, 3, 0, 0, 9, 3, 0, 0, 36, 
40, 5, 4, 0, 11, 0, 7, 5, 25, 39, 26, 4, 20, 12, 4, 17, 3, 22, 
12, 14, 8, 9, 11, 7, 11, 10, 9, 16, 6, 24, 8, 5, 6, 14, 3, 9, 
4, 1, 20, 0, 1, 7, 9, 0, 12, 2, 29, 56, 16, 8, 28, 0, 19, 25, 
35, 87, 56, 66, 60, 58, 14, 10, 12, 13, 13, 34, 26, 18, 13, 22, 
13, 12, 15, 41, 11, 11, 11, 5, 6, 7, 8, 8, 17, 16, 12, 21, 38, 
34, 10, 77, 41, 7, 12, 1, 16, 20, 8, 5, 2, 20, 7, 16, 12, 6, 
10, 31, 12)

x1 <- c(0, 63, 1, 4, 1, 44, 2, 2, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 3, 
0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
6, 1, 11, 1, 1, 0, 0, 0, 2)
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    Are you sure that this actually fits a zero-inflated distribution? I get exactly the same lambda and uncertainty with `print(fitdist(data, 'pois', start = list(lambda = 10), discrete = TRUE, method = "mle"))`. But of course, OP's data might not be zero-inflated. – Roland Jun 01 '21 at 05:40
  • @Roland You're right, thanks, `pstr0=` also needs to be optimized, currently it fits just a Poisson, I'll edit that. – jay.sf Jun 01 '21 at 05:47
  • Appears to work but is just a very bad fit with OP's data. Maybe add some plots? `hist(data, freq = FALSE, breaks = 0:90 - 0.5); curve(dzipois(x, lambda = 12.2, pstr0 = 0.12), add = TRUE, col = "blue", from = 0, to = 90, type = "p", n = 91)` – Roland Jun 01 '21 at 06:18
  • 1
    @Roland Sure, fit was not asked, though:) Also see plots in [other question](https://stackoverflow.com/a/67748368/6574038). – jay.sf Jun 01 '21 at 06:25