Scikit-Learn Linear Regression how to get coefficient's respective features?

Question

I'm trying to perform feature selection by evaluating my regressions coefficient outputs, and select the features with the highest magnitude coefficients. The problem is, I don't know how to get the respective features, as only coefficients are returned form the coef._ attribute. The documentation says:

Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

I am passing into my regression.fit(A,B), where A is a 2-D array, with tfidf value for each feature in a document. Example format:

         "feature1"   "feature2"
"Doc1"    .44          .22
"Doc2"    .11          .6
"Doc3"    .22          .2

B are my target values for the data, which are just numbers 1-100 associated with each document:

"Doc1"    50
"Doc2"    11
"Doc3"    99

Using regression.coef_, I get a list of coefficients, but not their corresponding features! How can I get the features? I'm guessing I need to modfy the structure of my B targets, but I don't know how.

score 33 · Answer 1 · answered Apr 29 '17 at 19:41

33

What I found to work was:

X = your independent variables

coefficients = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(logistic.coef_))], axis = 1)

The assumption you stated: that the order of regression.coef_ is the same as in the TRAIN set holds true in my experiences. (works with the underlying data and also checks out with correlations between X and y)

answered Apr 29 '17 at 19:41

Kirsche

331
3
2

9

I think you can just do pd.DataFrame(zip(X.columns, logistic.coef_)) – DataOrc Sep 06 '17 at 14:26

score 15 · Answer 2 · answered Jan 03 '19 at 17:24

15

You can do that by creating a data frame:

cdf = pd.DataFrame(regression.coef_, X.columns, columns=['Coefficients'])
print(cdf)

answered Jan 03 '19 at 17:24

Pran Kumar Sarkar

953
12
26

1

regression.coef_ is now returned as a dataframe so to do this cdf = pd.concat([pd.DataFrame(X.columns),pd.DataFrame(np.transpose(regression.coef_))], axis = 1) – tim.newport Nov 04 '21 at 02:58

score 9 · Answer 3 · answered Jun 09 '17 at 09:08

9

coefficients = pd.DataFrame({"Feature":X.columns,"Coefficients":np.transpose(logistic.coef_)})

answered Jun 09 '17 at 09:08

Snowde

91
1
1

1

This does not work for me. *Exception: Data must be 1-dimensional* – ytu Jan 11 '18 at 10:33
1

@ytu try coefficients = pd.DataFrame({"Feature":X.columns,"Coefficients":np.transpose(logistic.coef_[0, )}) – plumbus_bouquet Apr 05 '18 at 04:27

score 8 · Answer 4 · answered Nov 15 '14 at 23:31

8

I suppose you are working on some feature selection task. Well using regression.coef_ does get the corresponding coefficients to the features, i.e. regression.coef_[0] corresponds to "feature1" and regression.coef_[1] corresponds to "feature2". This should be what you desire.

Well I in its turn recommend tree model from sklearn, which could also be used for feature selection. To be specific, check out here.

answered Nov 15 '14 at 23:31

Jake0x32

1,402
2
11
18

1

This is true as long as regression.coef_ returns coefficinet values in the same order. Thanks. – jeffrey Nov 16 '14 at 00:55
The ExtraTreesClassifier is actually very interesting, but it seems there is no way to retrieve the actual features which it picked after the model has been fit? – jeffrey Nov 16 '14 at 01:17
@jeffrey Yes, but I always select feature by `clf.feature_importances_ ` to retrieve the importance ranking of features. Well intuitively it is just like the coefficients of the Linear Model, isn't it? – Jake0x32 Nov 16 '14 at 01:41
1

Well, if you use a feature selection method like a CountVectorizer(), it has a method get_feature_names(). Then you can map get_feature_names() to .coef_ (i think they are in order, I'm not sure). However, you cannot do this with the tree. – jeffrey Nov 16 '14 at 01:56

score 4 · Answer 5 · answered Apr 25 '20 at 13:22

4

Coefficients and features in zip

print(list(zip(X_train.columns.tolist(),logreg.coef_[0])))

Coefficients and features in DataFrame

pd.DataFrame({"Feature":X_train.columns.tolist(),"Coefficients":logreg.coef_[0]})

answered Apr 25 '20 at 13:22

Ankit Kumar Rajpoot

5,188
2
38
32

score 3 · Answer 6 · answered Dec 29 '21 at 13:49

3

This is the easiest and most intuitive way:

pd.DataFrame(logisticRegr.coef_, columns=x_train.columns)

or the same but transposing index and columns

pd.DataFrame(logisticRegr.coef_, columns=x_train.columns).T

answered Dec 29 '21 at 13:49

Pablo Vilas

546
5
13

score 1 · Answer 7 · answered Sep 20 '18 at 03:13

1

Suppose your train data X variable is 'df_X' then you can map into a dictionary and feed into pandas dataframe to get the mapping:

pd.DataFrame(dict(zip(df_X.columns,model.coef_[0])),index=[0]).T

answered Sep 20 '18 at 03:13

clieforce

11
2

score 0 · Answer 8 · edited Aug 18 '20 at 12:48

0

Try putting them in a series with the data columns names as index:

coeffs = pd.Series(model.coef_[0], index=X.columns.values)
coeffs.sort_values(ascending = False)

edited Aug 18 '20 at 12:48

Brian Tompsett - 汤莱恩

5,753
72
57
129

answered Aug 18 '20 at 12:16

Hanan Tabak

21
1
5

Scikit-Learn Linear Regression how to get coefficient's respective features?

8 Answers8

Coefficients and features in zip

Coefficients and features in DataFrame