0

I am doing PLS regression with the mixOmics package but struggle with the prediction part. So, if my model with three compoments is pls.res, predict(pls.res, newdata) gives a list with predict, variates and B.hat. predict is a matrix with dimensions 100x1x3. The documentation says that the latter dimension is the model dimensions. Since my response is a one-dimensional vector only, I guess that the model dimensions correspond to the components. But how can I get the most accurate prediction of Y from the PLS model?

An example; if I want to predict wt from the mtcars dataset using PLS regression:

library("mixOmics")
df <- mtcars
pls.res <- pls(df[,1:5], df$wt, mode = "regression")
pls.pred <- predict(pls.res, df[,1:5])
head(pls.pred)

produces the following (truncuated) output

$predict
, , dim1
                           Y
Mazda RX4           2.857348
Mazda RX4 Wag       2.857348
...

, , dim2
                           Y
Mazda RX4           2.847449
Mazda RX4 Wag       2.847449
...

$variates
                          dim1        dim2
Mazda RX4           -0.8392959 -0.02104679
Mazda RX4 Wag       -0.8392959 -0.02104679
...

$B.hat
, , dim1
              Y
mpg  -0.2161400
cyl   0.1949251
...

, , dim2
                 Y
mpg  -0.4171832787
cyl   0.0002618905
...

$call
predict.mixo_pls(object = pls.res, newdata = df[, 1:5])

and I don't understand the difference between the (in this case) two dimensions of $predict

taffel
  • 133
  • 1
  • 5
  • 1
    Welcome to SO! Please provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), in your case it would already help to know which function you exactly use and maybe a `summary` or `str` of your output – starja Aug 05 '20 at 21:22
  • Thank you, and thank you for the comment, I will add an example! – taffel Aug 05 '20 at 23:23
  • As far as I understand it, PLS tries to find linear combinations of X and Y that maximise the covariance. You can use different numbers of linear combinations, or components. `pls` by default uses 2 components, so the first prediction is only based on the first component, the second prediction is based on the first two components (this is how I understand it). Please have a look yourself regarding PLS, e.g. here: https://personal.utdallas.edu/~herve/Abdi-PLS-pretty.pdf, http://users.cecs.anu.edu.au/~kee/pls.pdf, https://web.stanford.edu/~hastie/ElemStatLearn//printings/ESLII_print12.pdf – starja Aug 06 '20 at 19:36
  • Thank you, @starja, that was exactly what I was unsure about (that the second prediction was based on two components)! – taffel Aug 10 '20 at 15:17

0 Answers0