I am fitting a negative binomial model using glm.nb from the MASS package of R with as output Y depending on variable y1, x, and z .
The problem is that I get different estimated coefficients using the same model but with the following equivalent formulas: Y ~ y1 + x * z
or Y ~ x * z + y1
. (reversing the order)
Below are my data and the results with the two formulas:
z <-c("T3", "T3", "T3", "T2", "T2", "T3", "T3", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T2", "T1", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T2", "T3", "T2", "T3", "T2", "T2", "T2", "T2", "T3", "T3", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T3", "T3", "T3", "T2", "T1", "T3", "T3", "T3", "T3", "T2", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T3", "T2", "T3", "T2", "T3", "T3", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T3", "T2", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T3", "T3", "T2", "T3", "T2", "T3", "T2", "T2", "T2", "T2", "T3", "T3", "T3", "T2", "T2", "T3", "T2", "T3", "T3", "T2", "T2")
x <- c("B-", "B-", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B+", "B-", "B+", "B+", "B-", "B-", "B+", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B+", "B+", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B+", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B+", "B-", "B+", "B+", "B+", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B-", "B-", "B-", "B-", "B-", "B+", "B+", "B-", "B-", "B+", "B+", "B-", "B+", "B+", "B-", "B-", NA )
Y <- c(0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 13, 19, 1, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 1, 0, 2, 0, 0, 6, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 10, 0, 0, 0, 0, 0, 0, 3)
y1 <- c(1, 3, 0, 1, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 2, 3, 0, 0, 1, 0, 5, 0, 2, 0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 1, 12, 1, 0, 0, 0, 2, 0, 4, 0, 0, 0, 0, 4, 0, 0, 0, 2, 2, 0, 0, 1, 0, 0, 16, 0, 14, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 6, 0, 0, 1, 0, 0, 1, 13)
data1 <- data.frame(Y, y1, z, x)
glm.nb(Y ~ x * z + y1, data = data1)
glm.nb(Y ~ y1 + x * z, data = data1)
> glm.nb(Y ~ x * z + y1, data = data1)
Call: glm.nb(formula = Y ~ x * z + y1, data = data1, init.theta = 0.2911311528,
link = log)
Coefficients:
(Intercept) x1 z1 z2 y1 x1:z1 x1:z2
-13.27592 -0.06054 -24.03075 11.75385 0.38529 0.54648 -0.11488
Degrees of Freedom: 112 Total (i.e. Null); 106 Residual
(1 observation deleted due to missingness)
Null Deviance: 94.45
Residual Deviance: 61.74 AIC: 199.8
> glm.nb(Y ~ y1 + x * z, data = data1)
Call: glm.nb(formula = Y ~ y1 + x * z, data = data1, init.theta = 0.2911311528,
link = log)
Coefficients:
(Intercept) y1 x1 z1 z2 x1:z1 x1:z2
-13.4950 0.3853 -0.7217 -24.4689 11.9729 -0.7759 0.5463
Degrees of Freedom: 112 Total (i.e. Null); 106 Residual
(1 observation deleted due to missingness)
Null Deviance: 94.45
Residual Deviance: 61.74 AIC: 199.8
I get different values of coefficients between the two formulas. Knowing that the variable z has only two observations with the T1 modality, can this influence the estimation of my coefficients?
Is it possible to have identical results with the two formulas Y ~ y1 + x * z
and Y ~ x * z + y1
even if there is a convergence problem?
Thank you for your answers and your help.