1

I've been reading about PCA in sklearn, specifically the relationships between features and components. I am particularly interested in identifying feature importance with respect to a couple of principle components. However, I found a few posts that say different things.

For instance, in the 3 answers in this post are discussed eigenvectors and loadings. Particularly, it is mentioned that pca.components_.T * np.sqrt(pca.explained_variance_) shows the component loadings of the features. Why is sqrt used here? And why the product?

However, in this answer it is indicated that abs(pca.components_) gives you the feature importance in each component. This seems to contradict what is indicated above, yes? This blog post also indicates that pca.components_ is the component loading of each feature.

Additionally, I miss to understand how this answers the question: "I think what you call the "loadings" is the result of the projection for each sample into the vector space spanned by the components. Those can be obtained by calling pca.transform(X_train) after calling pca.fit(X_train)." But this is not correct: loadings pertain to the coefficient of each feature on the principal components, and not samples. Agree?

Would really appreciate some clarification in here.

Sos
  • 1,783
  • 2
  • 20
  • 46

0 Answers0