0

From Plotting only selected loadings in R, I have been able to plot selected observations successfully, however, I was wondering if it was possible to set different colors to the arrows. For example, I have been able to find the highest contributor to PC1 and likewise for PC2 using the Iris dataset, Petal.Length and Sepal.Width respectively. Is it possible to have the loading vector for Petal.Length, which corresponds to PC1, be a different color than the loading vector for Sepal.Width, which corresponds to PC2?

library(tibble)
library(ggplot2)
library(ggfortify)
library(tidyverse)

pca <- prcomp(iris[, 1:4], scale.=TRUE)

# finding the loading with the highest contribution to PC1 and PC2
pca_loadings <- pca$rotation
pca_loadings <- pca_loadings %>%
  as_tibble(rownames="measurement")
pca_topmeasurement <- pca_loadings %>%
  select(measurement, PC1, PC2) %>%
  pivot_longer(matches("PC"), names_to = "PC", values_to = "loading") %>%
  group_by(PC) %>%
  arrange(desc(abs(loading))) %>%
  slice(1:1) %>%
  pull(measurement) %>%
  unique()

# plotting the biplot
pca_bi <- autoplot(pca, x=1, y=2, data=iris, colour='Species', loadings=TRUE, loadings.label=1) +
  ggtitle("biplot")
pca_bi$layers[[2]]$data<-pca_bi$layers[[2]]$data[pca_topmeasurement, ] # using ggfortify to change geom_segment layer 
pca_bi$layers[[3]]$data<-pca_bi$layers[[3]]$data[pca_topmeasurement, ] # using ggfortify to change geom_text layer

1 Answers1

2

You don't need to do any of that layer hacking. autoplot will forward arguments to biplot, which includes loadings.colour and loadings.label.colour, which take vectorized input. Therefore you can do the whole thing in a single call:

autoplot(pca, x = 1, y = 2, data = iris, colour = "Species",
         loadings = TRUE, loadings.label = TRUE,
         loadings.colour = c("blue", "red", NA, NA),
         loadings.label.colour = c("blue", "red", NA, NA),
         main = "biplot")

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • That is perfect for the Iris dataset but how will this work for a dataset of 182 observations? I have found the names of the observations I need plotted. – wantingtoimprove Nov 02 '22 at 22:52
  • @wantingtoimprove you can do the same thing. Just create a vector of NA the same length as the number of loadings, then set the ones you want to show as the colors you want – Allan Cameron Nov 02 '22 at 22:57