3

I'm looking for a fast svd library, in either c, c++ or java. Ultimately I'm using Java, but I'm very comfortable using jna to wrap c++, eg http://github.com/hughperkins/jeigen

I'm looking for a fast svd library that will handle sparse matrices. To keep this objective, so that the question doesn't get marked as too subjective, let's say:

I looked around at a few libraries and found:

  • matlab: super fast, about 10 seconds, but it's not really a 'library' as such. average squared projection error: 0.93
  • redsvd: super fast, about 1 second to run, for 6 features, but the average squared projection error is 0.97, which is very high
  • Eigen's svd is both very slow, and only for dense matrices
  • svdlibc: ran for 28 minutes before I stopped it; I guess it's calculating the full S, rather than just the first 6 features or so

Basically, I'm looking for a library that gives about the same speed and average squared projection error as matlab, or at least, somewhat comparable.

Hugh Perkins
  • 7,975
  • 7
  • 63
  • 71
  • What algorithm are you looking for ? [Randomized PCA](http://scikit-learn.org/stable/modules/decomposition.html) (not c++/java but I think the right algorithm) for 20 newsgroups x 10k features, 6 PCs --> `7.0 sec pca explained_variance_ratio_ .79 .062 .044 .039 .031 .03`. – denis Nov 26 '12 at 10:57
  • I'm suprised Eigen's SVD is so slow. I have never used Eigen for this. Has it improved since you last asked this question? – Z boson May 29 '15 at 09:45
  • BTW, [when are you going to implement the sparse solvers in JEigen](https://stackoverflow.com/questions/17046585/cholmod-in-java/30526005#30526005)? – Z boson May 29 '15 at 09:49
  • @Z boson, seems like you've already implemented this. Want to send a pull request? – Hugh Perkins May 30 '15 at 09:22

1 Answers1

3

From my experience, svdlibc is the best library of those options. I've dug a bit through its code before and I don't believe it's calculating the full S matrix (i.e., it is a true "thin svd"). If you can control the matrix representation on disk, svdlibc performs much faster when using the sparse binary input format due to the significantly lower I/O overhead.

The S-Space Package provided an executable jar around the SVDLIBJ java port of SVDLIBC. However, they found it had different results than SVDLIBC for certain input solutions.

David Jurgens
  • 304
  • 1
  • 8
  • Ok. Do you know how I can request the command-line version to only return the first 6 features, rather than calculating the whole matrix? – Hugh Perkins Oct 31 '12 at 04:17