I am experimenting with the mdvis
dataset from the COUNT
package of R
for a teaching purpose. I fitted a zero-inflated negative-binomial model using the zeroinfl
function from the pscl
and countreg
packages. However, the results of zeroinfl
from the pscl
package and from countreg
package differ a lot.
The models and the outputs are provided below.
zeroinfl
from pscl
:
mdvisit_zeroinf<-pscl::zeroinfl(numvisit ~ reform + badh + agegrp + educ + loginc | reform + badh + agegrp + educ + loginc, dist = "negbin", data = mdvis, control = zeroinfl.control(method = "BFGS", EM=F)
summary(mdvisit_zeroinf)
Call:
pscl::zeroinfl(formula = numvisit ~ reform + badh + agegrp + educ + loginc |
reform + badh + agegrp + educ + loginc, data = mdvis, dist = "negbin")
Pearson residuals:
Min 1Q Median 3Q Max
-0.9667 -0.8037 -0.3597 0.3154 9.4878
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.30316 0.56565 -0.536 0.5920
reform -0.09916 0.05543 -1.789 0.0736 .
badh 1.11555 0.07517 14.841 <2e-16 ***
agegrp2 0.11376 0.06990 1.628 0.1036
agegrp3 0.21126 0.08620 2.451 0.0143 *
educ2 0.07605 0.07209 1.055 0.2914
educ3 -0.03979 0.07295 -0.545 0.5854
loginc 0.13209 0.07499 1.761 0.0782 .
Log(theta) 0.04747 0.05815 0.816 0.4144
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -30.7110 772.2930 -0.040 0.968
reform 14.3274 763.7283 0.019 0.985
badh -1.9503 3.5392 -0.551 0.582
agegrp2 10.9236 113.6292 0.096 0.923
agegrp3 9.7723 113.6558 0.086 0.931
educ2 -0.6632 1.9878 -0.334 0.739
educ3 -0.8743 1.3501 -0.648 0.517
loginc 0.5437 1.9718 0.276 0.783
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 1.0486
Number of iterations in BFGS optimization: 106
Log-likelihood: -4557 on 17 Df`
zeroinfl
from countreg
:
mdvisit_zeroinf2<-countreg::zeroinfl(numvisit ~ reform + badh + agegrp + educ + loginc | reform + badh + agegrp + educ + loginc, dist = "negbin", data = mdvis, control = zeroinfl.control(method = "BFGS", EM=F))
summary(mdvisit_zeroinf2)
Call:
countreg::zeroinfl(formula = numvisit ~ reform + badh + agegrp + educ + loginc |
reform + badh + agegrp + educ + loginc, data = mdvis, dist = "negbin",
control = zeroinfl.control(method = "BFGS", EM = F))
Pearson residuals:
Min 1Q Median 3Q Max
-0.9523 -0.8057 -0.3615 0.3106 9.6300
Count model coefficients (negbin with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.44473 0.53752 -0.827 0.40803
reform -0.12962 0.05106 -2.538 0.01113 *
badh 1.13558 0.07413 15.319 < 2e-16 ***
agegrp2 0.05851 0.06359 0.920 0.35756
agegrp3 0.18678 0.07176 2.603 0.00925 **
educ2 0.06398 0.06621 0.966 0.33385
educ3 -0.04825 0.07087 -0.681 0.49599
loginc 0.15338 0.07056 2.174 0.02973 *
Log(theta) 0.01232 0.04778 0.258 0.79652
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -119.137 95.259 -1.251 0.2111
reform 4.821 2.646 1.822 0.0684 .
badh -21.461 4614.989 -0.005 0.9963
agegrp2 9.261 28.341 0.327 0.7438
agegrp3 3.502 27.984 0.125 0.9004
educ2 -19.790 203.802 -0.097 0.9226
educ3 -7.894 4.758 -1.659 0.0971 .
loginc 13.010 10.463 1.243 0.2137
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Theta = 1.0124
Number of iterations in BFGS optimization: 197
Log-likelihood: -4554 on 17 Df
What could be the reason for such a difference? Results from Stata
closely resemble the results of zeroinfl
from the pscl
package (especially, the results for the count part of the model).
I fited the models exactly similarly by specifying the same options using zeroinfl.control()
. I also tried to search online to see if others have reported similar issues previously, and if answers are already available. I searched for answers also on stackoverflow
and CV
. But I couldn't get any.