2

Singular value decomposition of matrix M of size (M,N) means factoring

enter image description here

How to obtain all three matrices from scikit-learn and numpy package?

I think I can obtain Sigma with PCA model:

import numpy as np
from sklearn.decomposition import PCA

model = PCA(N, copy=True, random_state=0)
model.fit(X)

Sigma = model.singular_values_
Sigma = np.diag(singular_values)

What about other matrices?

cs95
  • 379,657
  • 97
  • 704
  • 746
Dims
  • 47,675
  • 117
  • 331
  • 600

2 Answers2

2

You can get these matrices using numpy.linalg.svd as follows:

a=np.array([[1,2,3],[4,5,6],[7,8,9]])
U, S, V = np.linalg.svd(a, full_matrices=True)

S is a 1D array that represents the diagonal entries in Sigma. U and V are the corresponding matrices from the decomposition.

By the way, note that when you used PCA, the data is centered before svd is applied (unlike numpy.linalg.svd, where svd is applied directly on the matrix itself. see lines 409-410 here).

Miriam Farber
  • 18,986
  • 14
  • 61
  • 76
  • Is this equivalent to PCA when data is noisy? Can't `np.linalg.svd` just throw an exception while `PCA` estimator will still run? – Dims Aug 24 '17 at 17:16
  • 1
    I don't see why it should be equivalent. PCA code in GitHub also uses numpy.linalg.svd (this is shown in the link I provided in the answer). The only difference is that in the PCA code they use full_matrices=False and they center the data prior to the decomposition, but it is still the same function. – Miriam Farber Aug 24 '17 at 17:26
0

Can't comment on Mirian's answer because I don't have enough reputation, but from looking at Miriam's link, sklearn actually calls scipy's linalg.svd which is doesn't seem to be the same as np.linalg.svd (discussion here)

So it may be better to use U, S, V = scipy.linalg.svd(a, full_matrices=True)

nnaj20
  • 113
  • 1
  • 10