0

I am trying to create a biplot from iris data set using ggplot2 package. I have used below code to generate the biplot:

library(ggplot2)
library(devtools)


# Load iris dataset
data(iris)


# Run PCA and extract scores and loadings
iris_pca <- prcomp(iris[-5], scale. = TRUE)

scores <- as.data.frame(iris_pca$x) 
scores$Species <- iris$Species

loadings <- iris_pca$rotation

# Create biplot
biplot <- ggplot(data = scores, aes(x = PC1, y = PC2)) +
          # Scores on primary scales
          geom_point(aes(color = Species)) +
          # Loadings on secondary scales
          geom_segment(aes(x = 0, y = 0, xend = loadings[1,1], yend = loadings[1,2]), 
                       arrow = arrow(length = unit(0.3, "cm"), type = "closed", angle = 25)) +
          geom_segment(aes(x = 0, y = 0, xend = loadings[2,1], yend = loadings[2,2]), 
                       arrow = arrow(length = unit(0.3, "cm"), type = "closed", angle = 25)) +
          geom_segment(aes(x = 0, y = 0, xend = loadings[3,1], yend = loadings[3,2]), 
                       arrow = arrow(length = unit(0.3, "cm"), type = "closed", angle = 25)) +
          geom_segment(aes(x = 0, y = 0, xend = loadings[4,1], yend = loadings[4,2]), 
                       arrow = arrow(length = unit(0.3, "cm"), type = "closed", angle = 25)) +
          # Primary scales
          scale_x_continuous(limits = c(-3, 3), name = "PC1") +
          scale_y_continuous(limits = c(-3, 3), name = "PC2") +
          # Secondary scales
          scale_x_continuous(sec.axis = sec_axis(~ . / 1.2, name = "Loadings on PC1")) +
          scale_y_continuous(sec.axis = sec_axis(~ . / 1.2, name = "Loadings on PC2")) +
          # Theme
          theme_bw()

biplot

The above code results in a biplot as shown below:

Boiplot

How can I use a different secondary axis scale (limits = c(-0.8, 0.8)) which only affects zooming in the arrows and does not affect the primary scale (also not the scores or points)? Is there any possible way to achieve this? I would be thankful for your cooperation.

Regards, Farhan

Farhan
  • 57
  • 5
  • [This answer](https://stackoverflow.com/a/51844068/2530121) is probably the easiest to understand. – L Tyrone Apr 29 '23 at 09:39
  • The link provided by @LeroyTyrone does not address my specific requirement. Most of the solutions mentioned in the answers demonstrate the use of different secondary y-axis with the same primary x-axis. Although I also checked the link of the answer accepted by the original poster, the provided codes didn't work for me. In my case, I want to plot arrows with a different secondary x and y-axis, which is different from the primary x-axis. Essentially, I want to plot the arrows as a second plot, with a completely transparent background, overlaid on the first point chart. – Farhan Apr 30 '23 at 05:30

1 Answers1

1

A secondary scale can't be specified independent from the primary scale, i.e. the secondary scale always derives from the primary according to transformation specified via sec_axis(). This said, both the primary and the secondary scale have to be specified via one scale_xxx_continuous command. Moreover, the transformation specified via sec_axis() will only affect the breaks, the limits and the labels of the axis. It will not touch the data. Instead you have to take care of that by appropriately transforming the data using the inverse transformation applied on the scale. Finally, I simplified your code a bit by using just one geom_segment to add the arrows.

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.2.3

loadings <- as.data.frame(iris_pca$rotation)
loadings$Species <- rownames(loadings)

scale <- 2
# Create biplot
ggplot(data = scores, aes(x = PC1, y = PC2)) +
  geom_point(aes(color = Species)) +
  geom_segment(
    data = loadings, aes(
      x = 0, y = 0,
      xend = PC1 * scale, yend = PC2 * scale
    ),
    arrow = arrow(length = unit(0.3, "cm"), type = "closed", angle = 25)
  ) +
  # Primary scales
  scale_x_continuous(
    limits = c(-3, 3), name = "PC1",
    sec.axis = sec_axis(~ . / scale, name = "Loadings on PC1")
  ) +
  scale_y_continuous(
    limits = c(-3, 3), name = "PC2",
    sec.axis = sec_axis(~ . / scale, name = "Loadings on PC2")
  ) +
  # Theme
  theme_bw()
#> Warning: Removed 1 rows containing missing values (`geom_point()`).

stefan
  • 90,330
  • 6
  • 25
  • 51