Why do PLS regression coefficients with R [pls] differ with those from other R packages?

Question

Out of curiosity, I am trying to figure out why the PLS regression coefficients obtained with pls differ from the coefficients obtained with plsRglm, ropls, or plsdepot which all provide the same results.

Here is some code to start with. I have tried to play with the scale, center, and method arguments of the plsr function... but no success so far.

library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)

data(Cornell)

pls.plsr <- plsr(
  Y~X1+X2+X3+X4+X5+X6+X7, 
  data = Cornell, 
  ncomp = 3, 
  scale = TRUE, 
  center = TRUE
)

plsRglm.plsr <- plsR(
  Y~X1+X2+X3+X4+X5+X6+X7, 
  data = Cornell, 
  nt = 3, 
  scaleX = TRUE
)

ropls.plsr <- opls(
  as.matrix(Cornell[, grep("X", colnames(Cornell))]),
  Cornell[, "Y"], 
  scaleC = "standard"
)

plsdepot.plsr <- plsreg1(
  as.matrix(Cornell[, grep("X", colnames(Cornell))]),
  Cornell[, "Y"],
  comps = 3
)

## extract PLS regression coefficients for the PLS model with three components
coef(pls.plsr) # a
coef(plsRglm.plsr, type = "original") # b
coef(plsRglm.plsr, type = "scaled") # c
coef(ropls.plsr) # c
plsdepot.plsr$std.coefs # c
plsdepot.plsr$reg.coefs # b

Ventrilocus · Accepted Answer · 2021-04-22T17:14:42.533

Firstly, just for re-formatting purposes, we write:

library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)

data(Cornell)
pls.plsr <- plsr(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, 
                 data = Cornell, 
                 ncomp = 3, scale = T, center = T)
plsRglm.plsr <- plsR(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, 
                    data = Cornell, 
                    nt = 3, scaleX = TRUE)
ropls.plsr <- opls(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
                   Cornell[, "Y"], scaleC = "standard")
plsdepot.plsr <- plsreg1(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
                         Cornell[, "Y"], comps = 3)

That done, you may extract the coefficients in the original scale:

### ORIGINAL SCALE -  plsRglm, plsdepot
coef(plsRglm.plsr, type = "original")
plsdepot.plsr$reg.coefs

Or you can have them scaled:

### SCALED - plsRglm, ropls, plsdepot
coef(plsRglm.plsr, type = "scaled")
coef(ropls.plsr)
plsdepot.plsr$std.coefs

Therefore all methods now give rise to the same coefficients... Except for pls::plsr. Why? You may ask. The key is in the command. When you run:

coef(pls.plsr) # , , 3 comps

You see ", , 3". That is characteristic of a tensor object. What is this? Coefficients should be simply a vector. The reason is that coef is a generic function and it is not working properly for pls::plsr models. To see what it is actually extracting:

pls.plsr$coefficients
matrix(pls.plsr$coefficients, ncol = 3) # or in matrix form. coef simply extracts the third column (it should not)

But you can see the same fit for all models if you examine the equivalent object in each R-package as in:

matrix(pls.plsr$projection, ncol = 3)    
plsRglm.plsr$wwetoile
plsdepot.plsr$mod.wgs
ropls.plsr@weightStarMN

Therefore, for pls::plsr you were simply not extracting the coefficients.

Great answer, thank you! Although it seems the addition of intercepts in formulas has no effect on PLS models with these packages (try with and without, you can always extract an intercept and results are the same). — user2165907, Apr 22 '21 at 14:37
You are right! My apologies. I will correct the answer. Thanks! — Ventrilocus, Apr 22 '21 at 17:09

Why do PLS regression coefficients with R [pls] differ with those from other R packages?

1 Answers1