Get U, Sigma, V* matrix from Truncated SVD in scikit-learn

Question

I am using truncated SVD from scikit-learn package.

In the definition of SVD, an original matrix A is approxmated as a product A ≈ UΣV* where U and V have orthonormal columns, and Σ is non-negative diagonal.

I need to get the U, Σ and V* matrices.

Looking at the source code here I found out that V* is stored in self.components_ field after calling fit_transform.

Is it possible to get U and Σ matrices?

My code:

import sklearn.decomposition as skd
import numpy as np

matrix = np.random.random((20,20))
trsvd = skd.TruncatedSVD(n_components=15)
transformed = trsvd.fit_transform(matrix)
VT = trsvd.components_

maxymoo · Accepted Answer · 2017-04-18T00:50:12.367

61

Looking into the source via the link you provided, TruncatedSVD is basically a wrapper around sklearn.utils.extmath.randomized_svd; you can manually call this yourself like this:

from sklearn.utils.extmath import randomized_svd

U, Sigma, VT = randomized_svd(X, 
                              n_components=15,
                              n_iter=5,
                              random_state=None)

edited Apr 18 '17 at 00:50

answered Jul 21 '15 at 01:35

maxymoo

35,286
11
92
119

score 11 · Answer 2 · edited Jun 27 '17 at 06:58

11

One can use scipy.sparse.svds (for dense matrices you can use svd).

import numpy as np
from scipy.sparse.linalg import svds

matrix = np.random.random((20, 20))
num_components = 2
u, s, v = svds(matrix, k=num_components)
X = u.dot(np.diag(s))  # output of TruncatedSVD

If you're working with really big sparse matrices (perhaps your working with natural text), even scipy.sparse.svds might blow up your computer's RAM. In such cases, consider the sparsesvd package which uses SVDLIBC, and what gensim uses under-the-hood.

import numpy as np
from sparsesvd import sparsesvd


X = np.random.random((30, 30))
ut, s, vt = sparsesvd(X.tocsc(), k)
projected = (X * ut.T)/s

edited Jun 27 '17 at 06:58

henrywallace

633
7
16

answered Jul 21 '15 at 01:33

Vektor88

4,841
11
59
111

2

This is true but for the regular numpy.linalg.svd method you can't pass the number of components as a parameter so you have to extract the top K yourself. Minor inconvenience. – Felipe Nov 29 '15 at 03:12
X = u.dot(np.diag(s)) . This will not recreate X as 'v' is missing – Regi Mathew Apr 19 '21 at 01:58

score 9 · Answer 3 · answered Jan 24 '19 at 04:25

Just as a note:

svd.transform(X)

and

svd.fit_transform(X)

generate U * Sigma.

svd.singular_values_

generates Sigma in vector form.

svd.components_

generates VT. Maybe we can use

svd.transform(X).dot(np.linalg.inv(np.diag(svd.singular_values_)))

to get U because U * Sigma * Sigma ^ -1 = U * I = U.

score 8 · Answer 4 · answered Apr 09 '19 at 03:32

From the source code, we can see X_transformed which is U * Sigma (Here Sigma is a vector) is returned from the fit_transform method. So we can get

svd = TruncatedSVD(k)
X_transformed = svd.fit_transform(X)

U = X_transformed / svd.singular_values_
Sigma_matrix = np.diag(svd.singular_values_)
VT = svd.components_

Remark

Truncated SVD is an approximation. X ≈ X' = UΣV*. We have X'V = UΣ. But what about XV? An interesting fact is XV = X'V. This can be proved by comparing the full SVD form of X and the truncated SVD form of X'. Note XV is just transform(X), so we can also get U by

U = svd.transform(X) / svd.singular_values_

score 0 · Answer 5 · answered Apr 25 '22 at 21:06

If your matrices are not large, since numpy computes SVD by sorting singular values in order, this can be computed directly with np.linalg.svd simply by taking the first k singular values from Σ, first k columns of U, and first k rows of Vh. (Use full_matrices=False to get thin SVD if one of your dimensions is huge.)

m = np.random.random((5,5))
u, s, vh = np.linalg.svd(m)
u2, s2, vh2 = u[:,:2], s[:2], vh[:2,:]
m2 = u2 @ np.diag(s2) @ vh2  # rank-2 approx

If your matrices are large, then the randomized algorithms provided by sklearn.decomposition.TruncatedSVD will compute truncated SVD more efficiently.

score -2 · Answer 6 · answered Jun 10 '19 at 17:52

-2

I know this is an older question but the correct version is-

U = svd.fit_transform(X)
Sigma = svd.singular_values_
VT = svd.components_

However, one thing to keep in mind is that U and VT are truncated hence without the rest of the values it not possible to recreate X.

answered Jun 10 '19 at 17:52

Pawan nandakishore

111
1
1
4

3

U is definitely not `svd.fit_transform(X) `. This is wrong. – DukeLover Sep 14 '19 at 15:28

score -5 · Answer 7 · answered Jun 09 '16 at 16:40

-5

Let us suppose X is our input matrix on which we want yo perform Truncated SVD. Below commands helps to find out the U, Sigma and VT :

    from sklearn.decomposition import TruncatedSVD

    SVD = TruncatedSVD(n_components=r) 
    U = SVD.fit_transform(X)
    Sigma = SVD.explained_variance_ratio_
    VT = SVD.components_
    #r corresponds to the rank of the matrix

To understand the above terms, please refer to http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html

answered Jun 09 '16 at 16:40

Manika Agarwal

15
1

5

I believe this answer is not correct: `SVD.fit_transform(X) = U*np.diag(Sigma) != U` and `SVD.explained_variance_ratio_ = np.var(X_transformed, axis=0) / np.var(X, axis=0).sum() != Sigma` – rth Jul 05 '16 at 12:15
This answer is not correct, as mentioned by rth as well. – JRun Aug 16 '16 at 17:58

Get U, Sigma, V* matrix from Truncated SVD in scikit-learn

7 Answers7