According to the SGD documentation:
For multi-class classification, a “one versus all” approach is used.
So I think using SGDClassifier
cannot perform multinomial logistic regression either.
You can use statsmodels.discrete.discrete_model.MNLogit
, which has a method fit_regularized
which supports L1 regularization.
The example below is modified from this example:
import numpy as np
import statsmodels.api as sm
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
iris = load_iris()
X = iris.data
y = iris.target
X = sm.add_constant(X, prepend=False) # An interecept is not included by default and should be added by the user.
X_train, X_test, y_train, y_test = train_test_split(X, y)
mlogit_mod = sm.MNLogit(y_train, X_train)
alpha = 1 * np.ones((mlogit_mod.K, mlogit_mod.J - 1)) # The regularization parameter alpha should be a scalar or have the same shape as as results.params
alpha[-1, :] = 0 # Choose not to regularize the constant
mlogit_l1_res = mlogit_mod.fit_regularized(method='l1', alpha=alpha)
y_pred = np.argmax(mlogit_l1_res.predict(X_test), 1)
Admittedly, the interface of this library is not as easy to use as scikit-learn
, but it provides more advanced stuff in statistics.