1

I'm doing a principal component analysis, after I got the analysis result, how to identify the first couple of principal predictors? As it is messy from the plot. It's hard to see the predictors names:
enter image description here

Which part of the PCA results should I look into? This is more like how to determine the most important predictors which could explain, lets' say 80%, of the variance of your data. We know, e.g, the first 5 component did this, while the principal component is just combination of predictors. How to identify those "important" predictors.

Demo
  • 291
  • 1
  • 5
  • 16
  • 1
    Please provide a reproducible example when you're asking a question. The code you use to run the pca is more important the biplot generated. Also, please define what you mean by 'first couple of principle predictors'. – Adam Quek Apr 26 '17 at 03:23
  • @Adam Quek,This is more like how to determine the most important predictors which could explain, lets' say 80%, of the variance of your data. We know, e.g, the first 5 component did this. While the principal component is just combination of predictors. How to identify those "important" predictors. Is that clear? – Demo Apr 26 '17 at 03:28

1 Answers1

1

See this answer Principal Components Analysis - how to get the contribution (%) of each parameter to a Prin.Comp.?

The information is stored within your pca results. If you used prcomp(), then $rotation is what you are after, or if you used princomp(), then $loadings holds the key. Eg.

require(graphics)
data("USArrests")

pca_1<-prcomp(USArrests, scale = TRUE)
load_1<-with(pca_1,unclass(rotation))
aload_1<-abs(load_1)
sweep(aload_1, 2, colSums(aload_1), "/")
#               PC1       PC2       PC3        PC4
#Murder   0.2761363 0.2540139 0.1890303 0.40186493
#Assault  0.3005008 0.1141873 0.1485443 0.46016113
#UrbanPop 0.1433452 0.5301651 0.2094067 0.08286886
#Rape     0.2800177 0.1016337 0.4530187 0.05510509


pca_2<-princomp(USArrests,cor=T)
load_2<-with(pca_2,unclass(loadings))
aload_2<-abs(load_2)
sweep(aload_2, 2, colSums(aload_2), "/")

#            Comp.1    Comp.2    Comp.3     Comp.4
#Murder   0.2761363 0.2540139 0.1890303 0.40186493
#Assault  0.3005008 0.1141873 0.1485443 0.46016113
#UrbanPop 0.1433452 0.5301651 0.2094067 0.08286886
#Rape     0.2800177 0.1016337 0.4530187 0.05510509

As you can see, Murder, Assault, and Rape each contribute ~30% to PC1, whereas UrbanPop only contributes ~14% to PC1, yet is the major contributor to PC2 (~53%).

Community
  • 1
  • 1
J.Con
  • 4,101
  • 4
  • 36
  • 64
  • @j-con I have a relatively large dataset comprising of several psychological scores for 200 subjects in long format, giving me 20000 entries in the dataset. I would like to reduce dimensionality on this dataset by using PCA, how would you suggest I go about it? Thanks. – lf_araujo Apr 30 '17 at 23:59
  • This is a really great step-by-step tutorial. https://media.readthedocs.org/pdf/little-book-of-r-for-multivariate-analysis/latest/little-book-of-r-for-multivariate-analysis.pdf. Let me know if you get stuck – J.Con May 01 '17 at 00:04