I am trying to implement a logistic regression using statsmodels (I need the summary) and I get this error:
LinAlgError: Singular matrix
My df is numeric and correlated, I deleted the non-numeric and constant features. I tried to implement regular regression as well as one with l1 penalty (l2 isn't available) because of the correlated features.
I tried to check the matrix rank and got this print:
print(len(df.columns)) -> 156
print(np.linalg.matrix_rank(df.values)) -> 151
How do I know which features are a problem and why?
my code:
logit = sm.Logit(y,X)
result = logit.fit_regularized(trim_mode='auto', alpha=0,maxiter=150)
print(result.summary())
Update:
after removing highly correlated features I get:
len(df.columns) = np.linalg.matrix_rank(df.values)
but still the same error. (even if I set a low correlation threshold).
I tried to change the solver as well.