I've been reading about PCA in sklearn, specifically the relationships between features and components. I am particularly interested in identifying feature importance with respect to a couple of principle components. However, I found a few posts that say different things.
For instance, in the 3 answers in this post are discussed eigenvectors and loadings. Particularly, it is mentioned that pca.components_.T * np.sqrt(pca.explained_variance_)
shows the component loadings of the features. Why is sqrt
used here? And why the product?
However, in this answer it is indicated that abs(pca.components_)
gives you the feature importance in each component. This seems to contradict what is indicated above, yes? This blog post also indicates that pca.components_
is the component loading of each feature.
Additionally, I miss to understand how this answers the question: "I think what you call the "loadings" is the result of the projection for each sample into the vector space spanned by the components. Those can be obtained by calling pca.transform(X_train) after calling pca.fit(X_train)."
But this is not correct: loadings pertain to the coefficient of each feature on the principal components, and not samples. Agree?
Would really appreciate some clarification in here.