1

Have a question about PCA method in selecting important features.

So, let's assume that there is a dataset with 15 features and I want to understand the 5 most important features from the whole set. What I'm doing:

from sklearn.decomposition import PCA

x=scale(df)

pca = PCA(n_components=5)
df_t = pca.fit_transform(df)

df_t.explained_variance_ratio_

So finally I have an array with 5 explained variance. But for what features does it go? For first five from the dataset? Or for the most important? And if so, how can I understand the names of these features?

Keithx
  • 2,994
  • 15
  • 42
  • 71
  • I'm not an expert but feature reduction and feature selection are different things, from what I know about PCA, it's not a tool to select features but to create new ones from the ones you have, trying to keep the maximum variance by combining those that are correlated (so your 5 are somehow the 15). Again, I can be wrong, http://stats.stackexchange.com/ is probably more suited to questions like this. – polku Oct 24 '16 at 11:34
  • The output of explained_variance_ratio_ are not original features, but principal components. Have a look at this answer: http://stackoverflow.com/questions/15369006/finding-the-dimension-with-highest-variance-using-scikit-learn-pca – vpekar Oct 24 '16 at 11:34
  • Thanks, all clear! – Keithx Oct 24 '16 at 15:21

0 Answers0