I'm pretty sure it's a feature, not a bug, but I would like to know if there is a way to make sklearn
and statsmodels
match in their logit estimates. A very simple example:
import numpy as np
import statsmodels.formula.api as sm
from sklearn.linear_model import LogisticRegression
np.random.seed(123)
n = 100
y = np.random.random_integers(0, 1, n)
x = np.random.random((n, 2))
# Constant term
x[:, 0] = 1.
The estimates with statsmodels
:
sm_lgt = sm.Logit(y, x).fit()
Optimization terminated successfully.
Current function value: 0.675320
Iterations 4
print sm_lgt.params
[ 0.38442 -1.1429183]
And the estimates with sklearn
:
sk_lgt = LogisticRegression(fit_intercept=False).fit(x, y)
print sk_lgt.coef_
[[ 0.16546794 -0.72637982]]
I think it's got to do with the implementation in sklearn
, which uses some sort of regularization. Is there an option to estimate a barebones logit as in statsmodels
(it's substantially faster and scales much more nicely). Also, does sklearn
provide inference (standard errors) or marginal effects?