0

Similar to Question here: If I have one of the dummies of the categorical variables which has high VIF (multicollinearity), I would assume it should not be removed from the predictor list. But the logistic regression of statsmodels has the 'Singular matrix' problem. What to do when this happens? Possible solutions: 1. To remove all the dummies of this categorical variable; 2. To remove the high VIF dummy only, which makes the categorical variable missing one subcategory. Thanks!

Bridget Huang
  • 83
  • 1
  • 7
  • You could try different optimizer like `fit(method="nm", maxiter=5000)`, to check whether it's a problem during estimation or whether it also occurs at the mle.If that works, then you could use the estimated parameters as start_params for "bfgs" or default "newton" – Josef Apr 06 '21 at 19:54
  • another option is to merge categories which makes sense in some cases, e.g. a race/ethnicity category might not have enough observations when using many levels, so a remainder level like "other" might be needed. – Josef Apr 06 '21 at 19:57
  • Also, GLM family Binomial is the same underlying model as Logit, but the default optimizer irls is more robust to multicollinearity. – Josef Apr 06 '21 at 19:59

0 Answers0