2

I have a PCA with more than 150 variables, when plotting the loadings the PCA become obviously a mess. Is there a way to plot only selected loadings? As an example: with iris I end up with 4 loadings, how can I only plot 1 (let say Sepal.Width).

library(ggfortify)
df <- iris[1:4]
pca_res <- prcomp(df, scale. = TRUE)
autoplot(pca_res, data = iris, colour = 'Species', loadings = TRUE, loadings.label=1)

PCA example with iris and 4 loadings

1

Adam Quek
  • 6,973
  • 1
  • 17
  • 23
Werc
  • 23
  • 3

1 Answers1

1

Hi Werc welcome to SO:

a small disclaimer: This is not a proper Solution to this missing feature but more of a hack, a proper solution imo, would contain editing the source code of the ggfortify package and opening a pull request (or opening a feature request on github ).

However here’s a little "hack" to help you for now by editing the ggplot object:

library(ggfortify)
df <- iris[1:4]
pca_res <- prcomp(df, scale. = TRUE)
p0<-autoplot(pca_res, data = iris, colour = 'Species',loadings=TRUE, loadings.label=1)
p0 # default plot

# check which layers are relevant:
p0$layers # layers 2 (segment) and 3 (text)

# edit ggplot object geom_segment layer:
p0$layers[[2]]$data<-p0$layers[[2]]$data["Sepal.Width",]

# edit ggplot object geom_text layer:
p0$layers[[3]]$data<-p0$layers[[3]]$data["Sepal.Width",]


p0 # new Plot 

This gives you the requested Output of only ''Sepal.Width'' as a loading on your PCA plot:

enter image description here

user12256545
  • 2,755
  • 4
  • 14
  • 28
  • Thank you a lot for this hack, this was exactly what I was looking for. Even if it is time consuming (especially with a lot of variables) it allows to selectively shows important loadings. Foranyone else looking for an easier solution I found out that biplot(pca, showLoadings = TRUE,ntopLoadings=X) allows you to only plot the first X loadings (ordered by contribution). – Werc Jul 25 '22 at 07:27