Negative binomial glm.nb : Different results with unbalanced variable and complete separation

Question

I am fitting a negative binomial model using glm.nb from the MASS package of R with as output Y depending on variable y1, x, and z . The problem is that I get different estimated coefficients using the same model but with the following equivalent formulas: Y ~ y1 + x * z or Y ~ x * z + y1. (reversing the order)

Below are my data and the results with the two formulas:

z <-c("T3", "T3", "T3", "T2", "T2", "T3", "T3", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T2", "T1", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T2", "T3", "T2", "T3", "T2", "T2", "T2", "T2", "T3", "T3", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T3", "T3", "T3", "T2", "T1", "T3", "T3", "T3", "T3", "T2", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T3", "T2", "T3", "T2", "T3", "T3", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T3", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T2", "T2", "T3", "T2", "T3", "T3", "T2", "T2")
x <- c("B-", "B-", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B+", "B-", "B+", "B+", "B-", "B-", "B+", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B+", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B+", "B-", "B+", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B-", "B+", "B+", "B-", "B-", NA )
Y <- c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 13, 19, 1, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 1, 0, 2, 0, 0, 6, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 10, 0, 0, 0, 0, 0, 0, 3)
y1 <- c(1, 3, 0, 1, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 1, 0, 5, 0, 2, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 1, 12, 1, 0, 0, 0, 2, 0, 4, 0, 0, 0, 0, 4, 0, 0, 0, 2, 2, 0, 0, 1, 0, 0, 16, 0, 14, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 6, 0, 0, 1, 0, 0, 1, 13)

data1 <- data.frame(Y, y1, z, x)
glm.nb(Y ~  x * z + y1, data = data1)

glm.nb(Y ~  y1 + x * z, data = data1)

> glm.nb(Y ~  x * z + y1, data = data1)

Call:  glm.nb(formula = Y ~ x * z + y1, data = data1, init.theta = 0.2911311528, 
    link = log)

Coefficients:
(Intercept)           x1           z1           z2           y1        x1:z1        x1:z2  
  -13.27592     -0.06054    -24.03075     11.75385      0.38529      0.54648     -0.11488  

Degrees of Freedom: 112 Total (i.e. Null);  106 Residual
  (1 observation deleted due to missingness)
Null Deviance:      94.45 
Residual Deviance: 61.74    AIC: 199.8

> glm.nb(Y ~  y1 + x * z, data = data1)

Call:  glm.nb(formula = Y ~ y1 + x * z, data = data1, init.theta = 0.2911311528, 
    link = log)

Coefficients:
(Intercept)           y1           x1           z1           z2        x1:z1        x1:z2  
   -13.4950       0.3853      -0.7217     -24.4689      11.9729      -0.7759       0.5463  

Degrees of Freedom: 112 Total (i.e. Null);  106 Residual
  (1 observation deleted due to missingness)
Null Deviance:      94.45 
Residual Deviance: 61.74    AIC: 199.8

I get different values of coefficients between the two formulas. Knowing that the variable z has only two observations with the T1 modality, can this influence the estimation of my coefficients?

Is it possible to have identical results with the two formulas Y ~ y1 + x * z and Y ~ x * z + y1 even if there is a convergence problem?

Thank you for your answers and your help.

score 1 · Answer 1 · answered Oct 11 '22 at 21:14

This is admittedly a weird result — I would have expected some sort of warning. However, I think in the end you shouldn't worry about this difference, for reasons I will explain (tl;dr this is an optimistic fit to a small, noisy data set, and none of the interesting results differ between models).

all.equal(logLik(m1), logLik(m2)) says that the models have equally good fit (up to numerical tolerance)
all.equal(predict(m1), predict(m2)) says that the predictions are not identical — off by 2% on average. However:

bad <- which(abs(predict(m1)-predict(m2))>1e-3)
predict(m1)[bad]
##        19        53 
## -34.88559 -36.20873 
predict(m2)[bad]
         19        53 
## -35.99086 -39.19723

These predictions are on the log scale, so the predicted values are tiny, and the differences between the predictions are tiny on the count scale (e.g. exp(-35) is 6.3e-16 and exp(-36) is 2.3e-16).

The confidence intervals for all coefficients except the effect of y1 are ridiculously large (and all of the p-values are 1):

library(dotwhisker)
dwplot(list(m1,m2))

(If we set ci = NA to turn off the confidence intervals we could see the differences in the coefficients.)

This is an example of complete separation — not only are there only 2 observations in the T1 condition, but they're both zero ...

It would be wiser to omit the T1 condition entirely, it's just messing things up. In that case (m1A <- update(m1, data = subset(data, z != "T1")) and similarly for m2) you still have a significant effect of y1, and the two orders agree.

Negative binomial glm.nb : Different results with unbalanced variable and complete separation

1 Answers1