4

I want to benchmark the best 2 or 3 libraries to compute a truncated singular value decomposition (SVD), i.e. an SVD where only the k largest singular values are kept. Moreover, I have those constraints :

  • It has to be a java library
  • My matrices are sparse (around 1% non zero values)
  • My matrices are quite big (typically 10k x 5k)
  • My matrices can also be larger than high (5k x 10k)

I've encountered quite a large range of libraries, but for instance, with Colt, I don't even know if the SVD algorithm takes into account the fact that my matrix is sparse. Also, I did not find a single library that can directly compute the truncated solution (which is supposed to be much faster). Actually, I'm mostly interested in the approximate matrix obtained from the truncated SVD.

Thanks by advance for your help,

Romain Laroche

Maveric78f
  • 55
  • 6
  • colt is definitely too slow, on my settings. I'm going to try jama, but from what I've read so far, it should not be better. – Maveric78f Nov 15 '13 at 15:44
  • Colt is too slow but even more importantly, it works only for rectangular matrices that are higher than wide. – Maveric78f Dec 06 '13 at 08:16
  • I'm trying [EJML](https://code.google.com/p/efficient-java-matrix-library/), after following the recommendations of the [benchmark](https://code.google.com/p/java-matrix-benchmark/) of java libraries of matrices. It works much better than Colt as long as java memory does not heap space. – Maveric78f Dec 06 '13 at 08:21
  • Same issue @Maveric. I would like to run SVD in sparse matrix but no luck. I have found that apache.common.math works but it returns NaN value for all matrix. – Gaurav Koradiya May 11 '20 at 07:02

2 Answers2

5

I had the exact same problem, and my solution is to:

  1. run the SVD with Apache Commons Math on your matrix
  2. truncate the diagonal matrix to only keep the top-k singular values
  3. truncate the other two matrices by only taking the top-k columns for the first one and top-k rows for last one
  4. multiply the three matrices

What you obtain is the truncated SVD of your original matrix.

Below is the full solution, tested with matrices having a few thousands rows/columns.

public static double[][] getTruncatedSVD(double[][] matrix, final int k) {
    SingularValueDecomposition svd = new SingularValueDecomposition(new Array2DRowRealMatrix(matrix));

    double[][] truncatedU = new double[svd.getU().getRowDimension()][k];
    svd.getU().copySubMatrix(0, truncatedU.length - 1, 0, k - 1, truncatedU);

    double[][] truncatedS = new double[k][k];
    svd.getS().copySubMatrix(0, k - 1, 0, k - 1, truncatedS);

    double[][] truncatedVT = new double[k][svd.getVT().getColumnDimension()];
    svd.getVT().copySubMatrix(0, k - 1, 0, truncatedVT[0].length - 1, truncatedVT);

    RealMatrix approximatedSvdMatrix = (new Array2DRowRealMatrix(truncatedU)).multiply(new Array2DRowRealMatrix(truncatedS)).multiply(new Array2DRowRealMatrix(truncatedVT));

    return approximatedSvdMatrix.getData();
}
Alphaaa
  • 4,206
  • 8
  • 34
  • 43
0

I've used the http://math.nist.gov/javanumerics/jama/ library which is pretty good.

cyber-monk
  • 5,470
  • 6
  • 33
  • 42