0

I have a matrix of data A with dimension m-by-n. Where m is the number of data vectors and n is the dimension if each data vector (so they are arranged by rows).

If I do

[U,S,V] = svds(A, k);

the U matrix will be m-by-k, but I was expecting a n-by-k matrix (to be used to project any 1-by-n original vector to a 1-by-k one). What I am doing wrong? Should I arrange data by column? (i.e., use A' instead of A)

user2614596
  • 630
  • 2
  • 11
  • 30
  • Nothing is wrong. That's what the SVD function returns. Documentation says: `[U,S,V] = svds(A,...) computes the singular vectors as well. If A is M-by-N and K singular values are computed, then U is M-by-K with orthonormal columns, S is K-by-K diagonal, and V is N-by-K with orthonormal columns.`... so the function is doing what it's supposed to do. Please provide more context into your problem. The function is doing what it's supposed to be doing. – rayryeng Feb 03 '17 at 15:50
  • My aim dimensionality reduction, therefore is to take the first k singular vectors to transform my 1-by-n vectors to a reduced version of 1-by-k. If U has dimension m-by-k (i.e., one dimension is related to the number of data I have, not their dimension), how can I use it to transform an arbitrary 1-by-n vector to a 1-by-k ? – user2614596 Feb 03 '17 at 15:57
  • You simply have to multiply `U` by `S`. `B = U * S` gives you the dimensionality reduced set where each row of `B` produces the dimensionality reduced representation of each example you provided in `A`: http://stats.stackexchange.com/questions/107533/how-to-use-svd-for-dimensionality-reduction-to-reduce-the-number-of-columns-fea – rayryeng Feb 03 '17 at 15:58
  • And what is the projection matrix for any arbitrary new 1-by-n vector? – user2614596 Feb 03 '17 at 16:00
  • That would be the columns of `V`. Given that your data is standardized, `U * S = X * V` where `X` is the mean subtracted dataset with each feature (i.e. `X = bsxfun(@minus, A, mean(A, 1));`). Therefore, when you apply the `svd` on `X`, the projection matrix to reduce your dimensionality down to `k` is the first `k` columns of `V`. http://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca. Take any vector from your dataset in `X` or any row that is `1 x n` and multiply with the matrix `V` of size `n x k` to reduce your data down to `1 x k`. – rayryeng Feb 03 '17 at 16:07
  • The link you provided says: "To reduce the dimensionality of the data from p to k

    – user2614596 Feb 03 '17 at 16:09
  • They are both correct. `U * S = X * V` given that the data is standardized. By taking the first `k` columns of `U * S` or the first `k` columns of `V` and multiplying by `X`, the result is the same. Note that `X = U * S * V^{T}` but because `V` (or `U`) is orthonormal, `V^{-1} = V^{T}` and so we can move `V` to the left side of the equation, hence the relationship. You can write a very small MATLAB script to test this: `A = rand(100,4); X = bsxfun(@minus, A, mean(A, 1)); [U,S,V] = svds(X, 2); B1 = U*S; B2 = X*V; e = norm(B1(:) - B2(:)) < 1e-10;`. `e` should be 1 denoting their equivalence. – rayryeng Feb 03 '17 at 16:15
  • Therefore to store the parameters to apply the projection to new data I should only save V (since i used svds V has exactly k columns) and the mean vector of A (needed to normalize the new data before projecting).* Is it correct? – user2614596 Feb 03 '17 at 16:20
  • That is correct! Make sure your data is mean subtracted then extract whichever vector you want and then use the first `k` columns of `V` to do the projection. If you have **new** data, you **must** make sure you keep the means of each feature as you need these to subtract with the new data sample as that is what the `svd` used for reduction. Therefore, given a new point, keep the mean feature vector and keep the first `k` columns of `V`. This post I wrote a long time ago may help: http://stackoverflow.com/questions/39706561/how-to-use-eigenvectors-obtained-through-pca-to-reproject-my-data/ – rayryeng Feb 03 '17 at 16:21

0 Answers0