1

I am running a zero-inflated negative binomial regression model using the function zeroinfl from the pscl package.

I need to exclude NA's from the model in order to be able to plot the residuals against the dependent variable later in the analysis.

Therefore, I want to set na.action="na.exclude". I can do this without any problem for a non-zero-inflated negative binomial regression model (using glm.nb from the glm package), eg.

fm_nbin <- glm.nb(DV ~ factor(IDV) + contr1
               +contr2 + contr3, data=df, 
               subset=(df$var<500), na.action="na.exclude")
fm_nbin.res = resid(fm_nbin) 
plot(fm_nbin.res~df$var)  

works fine. However, when I do the same for a zero-inflated model, it does not work:

zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
               +contr2 + contr3 | factor(IDV) + contr1
               +contr2 + contr3, data=df, 
               subset=(df$var<500), na.action="na.exclude")
zinfl.res = resid(zinfl) 
plot(zinfl.res~df$var)

gives the error

Error in function (formula, data = NULL, subset = NULL, na.action = na.fail,  : 
  variable lengths differ (found for 'df$var')

Is there any other command I should use to exclude NA's from my regression?

Edit: This is the nearest of an answer I could find. Can it in some way be applied to my problem? Also, can naresid in some way be applied?

Community
  • 1
  • 1
Annerose N
  • 477
  • 6
  • 14

2 Answers2

1

As one finds by following the trail of documentation from zeroinfl to glm.fit: "The ‘factory-fresh’ default is na.omit." Notice that I have not put quotes around it since it is supposed to be a function rather but the function will accept it as a name so it doesn't matter if it is quoted. I will admit that I don't really know how na.omit and na.exclude really differ (something to do with residuals I read), but would definitely go with the default setting first, since it generally delivers what I want from regression functions. So try just leaving it out:

zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
           +contr2 + contr3 | factor(IDV) + contr1
           +contr2 + contr3, data=df, 
           subset=(df$var<500) )
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    Unfortunately, just leaving na.exclude out does not work in my case. As found at ?na.exclude: "na.exclude differs from na.omit only in the class of the "na.action" attribute of the result, which is "exclude". This gives different behaviour in functions making use of naresid and napredict: when na.exclude is used the residuals and predictions are padded to the correct length by inserting NAs for cases omitted by na.exclude." -- This is why it is in my case crucial to take na.exclude instead of na.omit. – Annerose N May 04 '13 at 19:41
  • 1
    If you offer `na.omit(df)` as the data argument, the function should not see any NA's. – IRTFM May 04 '13 at 23:30
  • Unfortunately, using `na.omit(df)` in the data argument just gives the same error as using `na.action="na.exclude"`. – Annerose N May 06 '13 at 09:40
0

Since both the option of using na.omit(df) or na.action="na.exclude" don't seem to work in a zeroinfl regression model, I found another (indirect) way of achieving that NA's are excluded in the regression.

First, since my original dataset contains far more variables than only the regressors and outcome variable, I created a new dataset including only the variables I use in the regression model; and also set a condition on the value of var to include observations in the regression:

df1 <- subset(df, var<500, select=c("DV", "IDV", "contr1", "contr2", "contr3"))
df1 <- na.omit(df1)

I then run the same code as above using the new dataset df1, which works perfectly:

zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
           +contr2 + contr3 | factor(IDV) + contr1
           +contr2 + contr3, data=df1)
zinfl.res = resid(zinfl) 
plot(zinfl.res~df1$DV)
Annerose N
  • 477
  • 6
  • 14
  • However, one needs unfortunately to repeat this procedure of defining a new dataset for each new regression model one wants to run, which can get cumbersome when using many different regression models. – Annerose N May 06 '13 at 09:51