1

This question is actually a duplicate of this one, which however remains unanswered at the time of writing.

Why is the explained_variance_ratio_ from TruncatedSVD not in descending order like it would be from PCA? In my experience it seems that the first element of the list is always the lowest, and then at the second element the value jumps up and then goes in descending order from there. Why is explained_variance_ratio_[0] < explained_variance_ratio_[1] ( > explained_variance_ratio_[2] > explained_variance_ratio_[3] ...)? Does this mean the second "component" actually explains the most variance (not the first)?

Code to reproduce behavior:

from sklearn.decomposition import TruncatedSVD

n_components = 50
X_test = np.random.rand(50,100)

model = TruncatedSVD(n_components=n_components, algorithm = 'randomized')
model.fit_transform(X_test)
model.explained_variance_ratio_
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Emily Finn
  • 53
  • 3
  • 1
    PCA is accomplished (depending on the package used) by performing SVD on mean centered data. Maybe the discussion below will help. https://stats.stackexchange.com/questions/189822/how-does-centering-make-a-difference-in-pca-for-svd-and-eigen-decomposition – mathew gunther Jan 28 '19 at 23:20

1 Answers1

0

If you scale the data first, then I think the explained variance ratios will be in descending order:

from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import StandardScaler

n_components = 50
X_test = np.random.rand(50,100)

scaler = StandardScaler()
X_test = scaler.fit_transform(X_test)

model = TruncatedSVD(n_components=n_components, algorithm = 'randomized')
model.fit_transform(X_test)
model.explained_variance_ratio_