4

I performed a partial least squares regression using Python's sklearn.cross_decomposition.PLSRegression

Is there a way to retrieve the fraction of explained variance for X, i.e. R2(X), for each PLS component? I'm looking for something similar to the explvar() function from the R pls package. However, I'd also appreciate any suggestions on how to compute it myself.

There is a similar question and there is one answer that explains how to get the variance of Y. I guess, that "variance in Y" is what was asked for in that case. That's why I opened a new question - hope that's O.K.

Gambit1614
  • 8,547
  • 1
  • 25
  • 51
ani
  • 163
  • 1
  • 7

2 Answers2

6

I managed to find a solution for the problem. The following gives the fraction of variance in X explained by each latent vector after PLS regression:

import numpy as np
from sklearn import cross_decomposition

# X is a numpy ndarray with samples in rows and predictor variables in columns
# y is one-dimensional ndarray containing the response variable

total_variance_in_x = np.var(X, axis = 0)

pls1 = cross_decomposition.PLSRegression(n_components = 5)
pls1.fit(X, y) 

# variance in transformed X data for each latent vector:
variance_in_x = np.var(pls1.x_scores_, axis = 0) 

# normalize variance by total variance:
fractions_of_explained_variance = variance_in_x / total_variance_in_x
ani
  • 163
  • 1
  • 7
  • +1 Thank you for this solution. When X is multivariate, you would have to do sum("total_variance_in_x"). Also, if you calculate (1 - "fractions_of_explained_variance"), you should get an array with the cumulative explained variance. –  Sep 12 '19 at 11:11
0

Im not sure about this, so if someone can contribute something ...

following these among others:

https://ro-che.info/articles/2017-12-11-pca-explained-variance

https://www.ibm.com/docs/de/spss-statistics/24.0.0?topic=reduction-total-variance-explained

variance_in_x = np.var(pls1.x_scores_, axis = 0) 
fractions_of_explained_variance = variance_in_x / np.sum(variance_in_x)