Identify Principal component from Biplot in R

Question

I'm doing a principal component analysis, after I got the analysis result, how to identify the first couple of principal predictors? As it is messy from the plot. It's hard to see the predictors names:

Which part of the PCA results should I look into? This is more like how to determine the most important predictors which could explain, lets' say 80%, of the variance of your data. We know, e.g, the first 5 component did this, while the principal component is just combination of predictors. How to identify those "important" predictors.

Please provide a reproducible example when you're asking a question. The code you use to run the pca is more important the biplot generated. Also, please define what you mean by 'first couple of principle predictors'. — Adam Quek, Apr 26 '17 at 03:23
@Adam Quek,This is more like how to determine the most important predictors which could explain, lets' say 80%, of the variance of your data. We know, e.g, the first 5 component did this. While the principal component is just combination of predictors. How to identify those "important" predictors. Is that clear? — Demo, Apr 26 '17 at 03:28

score 1 · Answer 1 · edited May 23 '17 at 12:26

See this answer Principal Components Analysis - how to get the contribution (%) of each parameter to a Prin.Comp.?

The information is stored within your pca results. If you used prcomp(), then $rotation is what you are after, or if you used princomp(), then $loadings holds the key. Eg.

require(graphics)
data("USArrests")

pca_1<-prcomp(USArrests, scale = TRUE)
load_1<-with(pca_1,unclass(rotation))
aload_1<-abs(load_1)
sweep(aload_1, 2, colSums(aload_1), "/")
#               PC1       PC2       PC3        PC4
#Murder   0.2761363 0.2540139 0.1890303 0.40186493
#Assault  0.3005008 0.1141873 0.1485443 0.46016113
#UrbanPop 0.1433452 0.5301651 0.2094067 0.08286886
#Rape     0.2800177 0.1016337 0.4530187 0.05510509


pca_2<-princomp(USArrests,cor=T)
load_2<-with(pca_2,unclass(loadings))
aload_2<-abs(load_2)
sweep(aload_2, 2, colSums(aload_2), "/")

#            Comp.1    Comp.2    Comp.3     Comp.4
#Murder   0.2761363 0.2540139 0.1890303 0.40186493
#Assault  0.3005008 0.1141873 0.1485443 0.46016113
#UrbanPop 0.1433452 0.5301651 0.2094067 0.08286886
#Rape     0.2800177 0.1016337 0.4530187 0.05510509

As you can see, Murder, Assault, and Rape each contribute ~30% to PC1, whereas UrbanPop only contributes ~14% to PC1, yet is the major contributor to PC2 (~53%).

@j-con I have a relatively large dataset comprising of several psychological scores for 200 subjects in long format, giving me 20000 entries in the dataset. I would like to reduce dimensionality on this dataset by using PCA, how would you suggest I go about it? Thanks. — lf_araujo, Apr 30 '17 at 23:59
This is a really great step-by-step tutorial. https://media.readthedocs.org/pdf/little-book-of-r-for-multivariate-analysis/latest/little-book-of-r-for-multivariate-analysis.pdf. Let me know if you get stuck — J.Con, May 01 '17 at 00:04

Identify Principal component from Biplot in R

1 Answers1