2

I am investigating the correlation between sensory data and chemical measurements using PLS regression from the pls package. Ultimately, I want to display the results in a correlation loading plot as illustrated by the example below. So far I managed to make the plot with X and Y correlation matrices but I haven't figured out a way to project the observations on the plot.

![correlation plot with observations](https://www.google.no/url?sa=i&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwjL8buTiZXeAhVDBiwKHSPKDWgQjhx6BAgBEAM&url=https%3A%2F%2Fwww.xlstat.com%2Fen%2Fsolutions%2Ffeatures%2Fpartial-least-squares-regression&psig=AOvVaw3NOfDzBjMZFcn-TXm16KaX&ust=1540126569608484)

As an example, I use the oliveoil data set from the pls package. I computed the correlation loadings (using the method described here) and created a correlation plot using ggplot2 (This can be done in a simple manner using the plsdepot package but I like the versatility of ggplot):

library(pls)
data("oliveoil")
oil <- plsr(sensory ~ chemical, scale = TRUE, data = oliveoil)

scores <- oil$scores
sc1 <- scores[,1]
sc2 <- scores[,2]
scores <- as.data.frame(cbind(sc1, sc2))
cl_plsr <- cor(model.matrix(oil), scores)
df_cor <- as.data.frame(cl_plsr)
df_depend_cor <- as.data.frame(cor(oliveoil$sensory, scores))
plot_loading_correlation  <-  rbind(df_cor, df_depend_cor)
plot_loading_correlation1 <- setNames(plot_loading_correlation, c("comp1", "comp2"))

#Function to draw circle
circleFun <- function(center = c(0,0),diameter = 1, npoints = 100){
  r = diameter / 2
  tt <- seq(0,2*pi,length.out = npoints)
  xx <- center[1] + r * cos(tt)
  yy <- center[2] + r * sin(tt)
  return(data.frame(x = xx, y = yy))
}

dat_plsr <- circleFun(c(0,0),2,npoints = 100)

library(ggplot2)
library(ggrepel)

p <- ggplot(data=plot_loading_correlation1, aes(comp1, comp2))+
  theme_bw() +
  geom_hline(aes(yintercept = 0), size=.2, linetype = 3)+ 
  geom_vline(aes(xintercept = 0), size=.2, linetype = 3)+
  geom_text_repel(aes(label = rownames(plot_loading_correlation1), 
              colour = c("black","black","black","black","black", 
                         "red","red","red","red","red","red")))+
  scale_color_manual(values=c("blue","red"))+
  scale_x_continuous(breaks = seq(-1,2.5, by=0.5))+
  scale_y_continuous(breaks = seq(-1.5,2.5, by=0.5))+
  coord_fixed(ylim=c(-1, 1), xlim=c(-1, 1)) + xlab("PC 1") + ylab("PC 2")+ 
  geom_path(data=dat_plsr ,
        aes(x,y), colour = "darkgrey")+
  theme(legend.title=element_blank())+
  theme(axis.ticks = element_line(colour = "black"))+
  theme(axis.title = element_text(colour = "black"))+
  theme(axis.text = element_text(color="black"))+
  theme(legend.position='none')+
  theme(panel.grid.minor = element_blank()) +
  theme(panel.grid.major = element_blank()) +
  geom_point(data = plot_loading_correlation1,
         aes(x=comp1, y=comp2), 
         colour = c("blue","blue","blue","blue","blue",
                    "red","red","red","red","red","red"), 
         shape = c(21,21,21,21,21,22,22,22,22,22,22),          
         fill = c("blue","blue","blue","blue","blue",
                  "red","red","red","red","red","red"),
         size = 2.2)
p

Correlation loading plot from ggplot2 without the observations

How can I project individual observations to that plot as illustrated in the example above? Should the scores be scaled so that they fit on the correlation loadings scale (from -1 to 1)? And is that acceptable scientifically?

Drosof
  • 191
  • 4
  • 16
  • 1
    I think you can always normalize the scores without loosing its variation. The plot is used more for grouping and exploration rather than actual measurement of scores, it should be acceptable. – TheRimalaya Oct 23 '18 at 10:12
  • Thanks @TheRimalaya. I think adding the observations adds a useful layer of information to the plot especially if you want to relate individual samples to their levels of measured chemical or perceived sensory characteristics. I am quite new to PLSR and the example illustrated above was generated by another program than R so I don't know the standard method to fit the observations to the correlation plot but I have seen several examples like this one in research papers. I have tried normalizing the scores using `scale(scores)` but the results still fall outside of the range -1, 1. – Drosof Oct 23 '18 at 12:25

0 Answers0