2

I've been struggling to justify why I'm getting intercept_=0.0 with LogisticRegression from scikit-learn. The fitted Logistic Regression has the following parameters:

LogisticRegression(C=0.0588579519026603, class_weight='balanced', 
                   dual=False, fit_intercept=True, intercept_scaling=6.2196752179914165,
                   max_iter=100, multi_class='ovr', n_jobs=1, penalty='l1',
                   random_state=1498059397, solver='liblinear', tol=0.0001,
                   verbose=0, warm_start=False)

The dataset I'm using has the following characteristics:

  • shape (113441, 69)
  • 1 feature with uniques (-1, 0)
  • 68 features with uniques (1, 0)

I started by exploring the coef_ attributes of the Logistic Regression and they are the following:

array([[-0.11210483,  0.09227395,  0.23526487,  0.1740976 ,  0.       ,
    -0.3282085 , -0.41550312,  1.67325241,  0.        ,  0.        ,
    -0.06987265,  0.        , -0.03053099,  0.        ,  0.09354742,
     0.06188271, -0.24618392,  0.0368765 ,  0.        ,  0.        ,
    -0.31796638,  1.75208672, -0.1270747 ,  0.13805016,  0.        ,
     0.2136787 , -0.4032387 , -0.00261153,  0.        ,  0.17788052,
    -0.0167915 ,  0.34149755,  0.0233405 , -0.09623664, -0.12918872,
     0.        ,  0.47359295, -0.16455172, -0.03106686,  0.00525001,
     0.13036978,  0.        ,  0.        ,  0.01318782, -0.10392985,
     0.        , -0.91211158, -0.11622266, -0.18233443,  0.43319013,
    -0.06818055, -0.02732619,  0.        , -0.09166496,  0.03753666,
     0.03857431,  0.        , -0.02650828,  0.19030955,  0.70891911,
    -0.07383034, -1.29428322, -0.69191842,  0.        ,  0.43798269,
    -0.66869241,  0.        ,  0.44498888, -0.08931519]])

where we can see some zeros (expected due to L1 penalty, right?) along with intercept_=0.0.

I would like to add that I tried with class_weight=None and I get intercept_ != 0.0.

What could be the reason for this intercept_=0.0? Is the intercept being regularized as well, and happens to be set to zero (as any other coefficient of coef_)? Was it mere "luck"? Is it due to my dataset?

HLopes
  • 155
  • 5

1 Answers1

1

From the docstring on the intercept_scaling parameter to LogisticRegression:

intercept_scaling : float, default 1.

Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

Why is this normal practice? The intercept term is technically just the coefficient to a column vector of 1s that you append to your X/feature terms.

For example, using simple linear regression, say you have a dataset of features X with 2 features and 10 samples. If you were to use scipy.linalg.lstsq to get the coefficients including the intercept, you'd first want to use something like statsmodels.tools.tools.add_constant to append a column of 1s to your features. If you didn't append the column of 1s, you'd only get 2 coefficients. If you did append, you'd get a third "coefficient" which is just your intercept.

The easy way to tie that back is to think of the predicted values. The intercept term multiplied by a column of 1s is just itself--i.e. you're adding the intercept (times one) to the summed product of the other coefficients and features, to get your nx1 array of predicted values.

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • I happened to take a look at scikit-learn Logistic Regression `liblinear` [here](https://github.com/scikit-learn/scikit-learn/blob/55c9443ca47eac25a3b878b7654744e59474f38f/sklearn/svm/base.py#L903) and found that `intercept_ = intercept_scaling * raw_coef_[:, -1]` . However, is this a good practice? I read that it is not recommended to regularize the intercept term, but I actually didn't find any string reason. – HLopes Jul 11 '17 at 17:17
  • I want to take a closer look because your `intercept_scaling=6.2196752179914165` when the default should be 1. (I don't see any reason it should change when calling `.fit`.) Could you possibly post your csv data to google docs? Either way, though, `liblinear.train_wrap` does regularize the intercept vector. – Brad Solomon Jul 11 '17 at 17:33
  • I arrived to that `intercept_scaling` value due to hyperparameter search with [skopt](https://scikit-optimize.github.io/) over most of the Logit hyperparameters. I'll try to upload the dataset – HLopes Jul 11 '17 at 17:42
  • Related discussion: [Scikit-learn Ridge Regression with unregularized intercept term](https://stackoverflow.com/questions/26126224/scikit-learn-ridge-regression-with-unregularized-intercept-term) – Brad Solomon Jul 11 '17 at 17:47