2

I have produced a logistic regression model in R using the logistf function from the logistf package due to quasi-complete separation. I get the error message:

Error in solve.default(object$var[2:(object$df + 1), 2:(object$df + 1)]) : system is computationally singular: reciprocal condition number = 3.39158e-17

The data is structured as shown below, though a lot of the data has been cut here. Numbers represent levels (i.e 1 = very low, 5 = very high) not count data. Variables OrdA to OrdH are ordered factors. The variable Binary is a factor.

OrdA OrdB OrdC OrdE OrdF OrdG OrdH Binary
1    3    4    1    1    2    1      1
2    3    4    5    1    3    1      1
1    3    2    5    2    4    1      0
1    1    1    1    3    1    2      0
3    2    2    2    1    1    1      0

I have read here that this can be caused by multicollinearity, but have tested this and it is not the problem.

VIFModel <- lm(Binary ~ OrdA + OrdB + OrdC + OrdD + OrdE +
                        OrdF + OrdG + OrdH, data = VIFdata)

vif(VIFModel)

                        GVIF Df   GVIF^(1/(2*Df))
OrdA                    6.09  3        1.35
OrdB                    3.50  2        1.37
OrdC                    7.09  3        1.38
OrdD                    6.07  2        1.57
OrdE                    5.48  4        1.23
OrdF                    3.05  2        1.32
OrdG                    5.41  4        1.23
OrdH                    3.03  2        1.31

The post also indicates that the problem can be caused by having "more variables than observations." However, I have 8 independent variables and 82 observations.

For context each independent variable is ordinal with 5 levels, and the binary dependent variable has 30% of the observations with "successes." I'm not sure if this could be associated with the issue. How do I fix this issue?

X <- model.matrix(Binary ~ OrdA+OrdB+OrdC+OrdD+OrdE+OrdF+OrdG+OrdH, 
        Data3, family = "binomial"); dim(X); Matrix::rankMatrix(X)


[1] 82 24
[1] 23
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.820766e-14
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Harry
  • 129
  • 4
  • I think we really need to a [mcve] in order to help with this ... how did you test for multicollinearity? Did you test the original data set or the model matrix? Are the ordinal variables coded as ordered factors? – Ben Bolker Jul 25 '20 at 19:32
  • Hi, thank you very much for your reply. I have tried to add some of the detail which you asked for. I am not experienced with producing reproducible examples. Please let me know if I should provide more information and I will gladly do so. – Harry Jul 25 '20 at 21:07
  • can you please post the results of `X <- model.matrix(Binary ~ ., your_data); dim(X); Matrix::rankMatrix(X)` ? (assuming that your data only contains the response and predictor variables; otherwise, fill in your actual linear model formula in `model.matrix()`. If this doesn't clear it up, you'll need to read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Ben Bolker Jul 25 '20 at 21:25
  • @BenBolker I think I have posted what you asked for. – Harry Jul 25 '20 at 21:35
  • Can you also post `str(Data3)` or `summary(Data3)`? I'm having a hard time figuring out how you can be getting 24 columns in your model matrix ... – Ben Bolker Jul 25 '20 at 21:41

1 Answers1

1

Short answer: your ordinal input variables are transformed to 24 predictor variables (number of columns of the model matrix), but the rank of your model matrix is only 23, so you do indeed have multicollinearity in your predictor variables. I don't know what vif is doing ...

You can use svd(X) to help figure out which components are collinear ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you so much for your help. It is sensible for me to try and identify the offending variables and remove them? – Harry Jul 25 '20 at 21:42
  • Probably (hard to say for sure without more context/not knowing where the actual multicollinearity is ...) – Ben Bolker Jul 25 '20 at 21:43