3

I used Scikit learn selectKbest to select the best features, around 500 from 900 of them. as follows where d is the dataframe of all the features.

from sklearn.feature_selection import SelectKBest, chi2, f_classif
X_new = SelectKBest(chi2, k=491).fit_transform(d, label_vs)

when I print X_new it now, it gives me numbers only but I need name of the selected features to use them later on.

I tried things like X_new.dtype.names but I did't got back anything and I tried to convert X_new into data frame but the only columns names I got were

1, 2, 3, 4... 

so is there a way to know what are the names of the selected features?

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
Talal Ghannam
  • 189
  • 2
  • 17

2 Answers2

3

Here is how you could do it, using get_support():

chY = SelectKBest(chi2, k=491)
X_new = chY.fit_transform(d, label_vs)
column_names = [column[0]  for column in zip(d.columns,chY.get_support()) if column[1]]

From @AI_Learning 's answer you could get the column names by:

column_names = d.columns[chY.get_support()]
Jibin Mathew
  • 4,816
  • 4
  • 40
  • 68
  • thanks. I have already finished with the project and I used a different route. however this is also very helpful for the future. thanks. – Talal Ghannam Mar 19 '19 at 22:46
2

You can use the .get_support() param of feature_selection, to get the feature names from your initial dataframe.

feature_selector = SelectKBest(chi2, k=491)
d.columns[feature_selector.get_support()]

Working example:

from sklearn.datasets import load_digits
import pandas as pd
from sklearn.feature_selection import SelectKBest, chi2
X, y = load_digits(return_X_y=True)
df = pd.DataFrame(X, columns= ['feaure %s'%i for i in range(X.shape[1])])

feature_selector = SelectKBest(chi2, k=20)

X_new = feature_selector.fit_transform(df, y)
X_new.shape

df.columns[feature_selector.get_support()]

Output:

Index(['feaure 5', 'feaure 6', 'feaure 13', 'feaure 19', 'feaure 20', 'feaure 21', 'feaure 26', 'feaure 28', 'feaure 30', 'feaure 33', 'feaure 34', 'feaure 41', 'feaure 42', 'feaure 43', 'feaure 44', 'feaure 46', 'feaure 54', 'feaure 58', 'feaure 61', 'feaure 62'], dtype='object')

Venkatachalam
  • 16,288
  • 9
  • 49
  • 77
  • 1
    thanks. I have already finished with the project and I used a different route. however this is also very helpful for the future. thanks. – Talal Ghannam Mar 19 '19 at 22:47