1

I've been trying to fit a zero truncated negative binomial to some count data. In some cases it works beautifully, for example:

library(ggplot2)
library(VGAM)
df <- data.frame(n = 1:7, freq = c(0.32, 0.286, 0.19, 0.10, 0.05, 0.02, 0.01))
fit = vglm(df$n ~ 1, posnegbinomial, weights=df$freq)
ggplot(df, aes(x = n, y = freq)) +
  geom_point(colour= "red") +
  geom_point(aes(x = n, 
                 y = dnbinom(n, size=Coef(fit)[2], 
                             mu = Coef(fit)[1])/(1-dnbinom(0, size=Coef(fit)[2], 
                                                           mu = Coef(fit)[1]))),
                 colour = "blue")

1

However, in other cases it fails:

df2 <- data.frame(n = 1:4, freq = c(0.87, 0.11, 0.01, 0.0005))
fit2 = vglm(df2$n ~ 1, posnegbinomial, weights=df2$freq)
ggplot(df2, aes(x = n, y = freq)) +
  geom_point(colour= "red") +
  geom_point(aes(x = n, 
                 y = dnbinom(n, size=Coef(fit2)[2], 
                             mu = Coef(fit2)[1])/(1-dnbinom(0, size=Coef(fit2)[2], 
                                                           mu = Coef(fit2)[1]))),
                 colour = "blue")

enter image description here

In such cases a Poisson fit works fine:

fit_p = vglm(df2$n ~ 1, pospoisson, weights=df2$freq)

ggplot(df2, aes(x = n, y = freq)) +
  geom_point(colour= "red") +
  geom_point(aes(x = n, 
                 y = dpois(n, Coef(fit_p)[1])/(1-dpois(0, Coef(fit_p)[1]))),
             colour = "blue")

enter image description here

Is there any way I can fix this?

Thanks!

OGW
  • 11
  • 2

1 Answers1

1

Are you sure you haven't mixed up something in your second model? I'm getting a much better fit. To avoid mistakes in repeating code, you may want to use a function for y, as I show you below:

library(VGAM)
fit1 <- vglm(df1$n ~ 1, posnegbinomial, weights=df1$freq)
fit2 <- vglm(df2$n ~ 1, posnegbinomial, weights=df2$freq)

## helper fun to get y
gy <- function(x) {
  cf <- Coef(x)
  dnbinom(seq(nrow(x@residuals)), size=cf[2], mu=cf[1]) / 
    (1-dnbinom(0, size=cf[2], mu=cf[1]))
}

## plot
op <- par(mfrow=c(1, 2))

plot(df1$n, df1$freq, main="fit1")
points(gy(fit1), col=2)

plot(df2$n, df2$freq, main="fit2")
points(gy(fit2), col=2)

par(op)

enter image description here


Data:

df1 <- data.frame(n=1:7, freq=c(0.32, 0.286, 0.19, 0.10, 0.05, 0.02, 0.01))
df2 <- data.frame(n=1:4, freq=c(0.87, 0.11, 0.01, 0.0005))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thanks for your response! You are correct, I had a bug in my code (fixed now in the main question). However, I'm still confused as to why the Poisson fit is near-perfect, while the NB fit is so comparatively poor. It was my understanding that NB should be able to approximate to a pure Poisson when appropriate (which in this case it appears to be!). – OGW Sep 21 '20 at 22:55
  • the difference is very striking, especially for my application which involves finding the chance that a value drawn from the fitted Poisson/NB is non-zero: for the Poisson fit this is 1-dpois(0, Coef(fit_p)[1]) = **0.22**, but for the NB it's 1-dnbinom(0, size=Coef(fit2)[2], mu = Coef(fit2)[1]) = **2.1e-12** – OGW Sep 21 '20 at 23:18
  • @OGW It seems to me that besides coding you also face statistical difficulties, for which this might be the wrong site. You could look into "Hurdle models" though. However, I'd advice you to ask the statistical part separately at [Cross Validated](https://stats.stackexchange.com/help/on-topic) where you are most likely to get competent answers from qualified guys. – jay.sf Sep 22 '20 at 08:58