Python - How to determine the feature / column names returned by Chi Squared test

Question

I am working on Feature selection process currently and as part of this, I need to apply chi-squared test over a list of available features present in a panda dataframe and determine which are the top 'n' best features of the panda dataframe.

From articles available on internet I can understand that the value of 'n' is determined by the value that we assign to the 'k' parameter of SelectKBest function that can be imported from sklearn.feature_selection.

But how do I get to know the feature / column names or numbers of the top 'n' features that are selected by the chi-squared test.

For better understanding below I mention the example (Thanks to chris albon for an easy example in his site) taken from this link : https://chrisalbon.com/machine-learning/chi-squared_for_feature_selection.html

# Load libraries
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2

# Load iris data
iris = load_iris()

# Create features and target
X = iris.data
y = iris.target

# Convert to categorical data by converting data to integers
X = X.astype(int)

# Select two features with highest chi-squared statistics
chi2_selector = SelectKBest(chi2, k=2)
X_kbest = chi2_selector.fit_transform(X, y)
type(X_kbest)

# Show results
print('Original number of features:', X.shape[1])
print('Reduced number of features:', X_kbest.shape[1])

As can be seen from the code, the input data is passed as a numpy array. Assume the four columns has names as Col_A, Col_B, Col_C, Col_D. And the test has chosen 3rd and 4th column as the two best features. This can be seen by printing the value of "X_kbest"

print(X_kbest)

[[1 0]
 [1 0]
 [1 0]
 ..., 
 [5 2]
 [5 2]
 [5 1]]

But I need my output as a list containing the only the selected feature names (In this case, it is Col_C and Col_D) or feature names along with the data

Specifically, see the [second answer there](https://stackoverflow.com/a/43765224) — , Nov 19 '17 at 20:11

Python - How to determine the feature / column names returned by Chi Squared test

0 Answers0