0

I am running the partial SVD of a large (120k x 600k) and sparse (0.1 of non-zero values) matrix on a 3,5GHz/3,9GHz (6 cores / 12 threads) server with 128GB of RAM using SVDLIBC.

Is it possible to speed up the process a little bit using multithreading so as to take full advantage of my server configuration?

I have no experience of multithreading; therefore I am asking for friendly advices and/or pointer to manuals/tutorials.

[EDIT] I am open to alternatives too (matlab/octave, r, etc.)

Pierre
  • 1,204
  • 8
  • 15

2 Answers2

0

In Matlab, for sparse matrices, you have svds. This implementation benefits from multithreaded computation (1)

tashuhka
  • 5,028
  • 4
  • 45
  • 64
  • thanx for the advice. do you know if this feature is available in Octave? otherwise I'll have to get a Matlab license. – Pierre Feb 12 '14 at 15:45
0

See irlba: Fast partial SVD by implicitly-restarted Lanczos bidiagonalization in R. It just calculates the first user-specified no. of dimensions. Had good experience with it in past. But, then I used on commercial version of R which was complied to take advantage of multi-threading so can't vouch for speed-improvement due because of multi-threading.

  • thanks for the advice, I'm gonna check it out. btw, I'm interested in the first 10k singular values. do you think it'll work as it mentions `a few approximate singular values and singular vectors of large matrices'? – Pierre Feb 12 '14 at 15:49
  • I never tried to extract those many before- the most I got out were in the 50-500 range. What I found useful in the past is to see how much time it takes to extract 10,20,40,80,160 dims and from there extrapolate how much time it takes to extract your desired no. of singular values. 10K- would be an overkill though, but what domain is your problem in? text-processing? recommender systems? You can try`random projection` See: http://stackoverflow.com/questions/4951286/svd-for-sparse-matrix-in-r/16308171#16308171 –  Feb 12 '14 at 15:54
  • I'm using SVD for LSA. With small collections, 50-500 is a good range for dimensionality reduction. However, with large collections, the problem is still open. As an experiment I ran SVD on a smaller collection (84k * 49k matrix) keeping only the first 7k singular values. According to Cattell's scree test, the 'best' dimensionality would be in a 2000-2500 range... Anyway, thanx for the link to 'random projection'! – Pierre Feb 13 '14 at 17:55