12

I've been wondering about this question for quite a while but cannot find a reference: How does Matlab transpose a sparse matrix so fast, given that it is stored in CSC (compressed sparse column) format?

Also its documentation verifies the efficiency of sparse matrix transposition:

To do this (accessing row by row), you can transpose the matrix, perform operations on the columns, and then retranspose the result … The time required to transpose the matrix is negligible.

Follow-up (modified as suggested by @Mikhail):

I agree with @Roger and @Milhail that setting a flag is enough for many operations such as the BLAS or sparse BLAS operations in terms of their interfaces. But it appears to me that Matlab does "actual" transposition. For example, I have a sparse matrix X with size m*n=7984*12411, and I want to scale each column and each row:

% scaling each column
t = 0;
for i = 1 : 1000
    A = X; t0 = tic;
    A = bsxfun(@times, A, rand(1,n));
    t = t + toc(t0);
end

t = 0.023636 seconds

% scaling each row
t = 0;
for i = 1 : 1000
    A = X; t0 = tic;
    A = bsxfun(@times, A, rand(m,1));
    t = t + toc(t0);
end

t = 138.3586 seconds

% scaling each row by transposing X and transforming back
t = 0;
for i = 1 : 1000
    A = X; t0 = tic;
    A = A'; A = bsxfun(@times, A, rand(1,m)); A = A';
    t = t + toc(t0);
end

t = 19.5433 seconds

This result means that accessing column by column is faster than accessing row by row. It makes sense because sparse matrices are stored column by column. So the only reason for the fast speed of column scaling of X' should be that X is actually transposed to X' instead of setting a flag.

Also, if every sparse matrix is stored in CSC format, simply setting a flag cannot make X' in CSC format.

Any comments? Thanks in advance.

Shai
  • 111,146
  • 38
  • 238
  • 371
Da Kuang
  • 852
  • 1
  • 8
  • 15
  • 2
    It probably just sets a flag that controls it's array access behaviour - swapping row/column indices on access and leaving the data lone is very fast. – Roger Rowland May 23 '13 at 19:07
  • @RogerRowland Please see the follow-up I added above. Thanks. – Da Kuang May 23 '13 at 20:23
  • I'd suggest to do a number of trials. 20 milliseconds is not a reliable time measurement. – Mikhail May 23 '13 at 21:42
  • @Mikhail I did 1000 iterations now and modified the question text above – Da Kuang May 23 '13 at 22:08
  • 1
    @DaKuang You have `rand(1,n)` for the first test and `rand(1,m)` for the third. Also, when you do a number of trials, `tic/toc` should be placed outside loops. – Mikhail May 24 '13 at 06:39
  • @Mikhail sorry the third code snippet should have description "scaling each row...". but please see my code for timing carefully I don't see problems there... – Da Kuang May 24 '13 at 17:57
  • @DaKuang Well, the only thing I can suggest now is to post this on Matlab Central. I can only make guesses about this behavior. – Mikhail May 24 '13 at 20:04
  • @Mikhail I agree maybe there's not much an outsider can do. Thanks! – Da Kuang May 24 '13 at 20:24
  • I tried on Matlab 2016a and found that the first two exmaples does not make difference now. – mendel Nov 08 '20 at 04:28

3 Answers3

9

After a week's exploration, my guess about the internal mechanism of transposing a matrix is sorting.

Suppose A is a sparse matrix,

[I, J, S] = find(A);
[sorted_I, idx] = sort(I);
J = J(idx);
S = S(idx);
B = sparse(J, sorted_I, S);

Then B is the transpose of A.

The above implementation has about half the efficiency of Matlab's built-in transpose on my machine. Considering Matlab's built-in functions are multi-threaded, my guess might be a reasonable one.

Da Kuang
  • 852
  • 1
  • 8
  • 15
  • 1
    Update: I tested the above snippet on some very large sparse matrices (>5G in memory), and its timing is about the same as Matlab's native transpose. – Da Kuang Jul 22 '13 at 22:59
  • Khuang Very impressive result. Thank's a lot for posting your finding! I wonder if Octave implemented a similar thing or if they chose a suboptimal way? – gaborous Jun 21 '14 at 22:46
3

I realize I am bit late to the game, but I thought I could help shed some light on this question. Transposing a sparse matrix is actually a straightforward task that can be accomplished in time proportional to the number of nonzero elements in the input matrix. Suppose that A is an m x n matrix stored in CSC format, i.e., A is defined by three arrays:

  1. elemsA, of length nnz(A), that stores the nonzero elements in A
  2. prowA, of length nnz(A), that stores the row indices of nonzero elements in A
  3. pcolA, of length n + 1, such that all nonzero elements in column j of A are indexed by the range [pcolA(j), pcolA(j + 1))

If B denotes the transpose of A, then our goal is to define the analogous arrays elemsB, prowB, pcolB. To do so, we use the fact that the rows of A form the columns of B. Let tmp be an array such that tmp(1) = 0 and tmp(i + 1) is the number of elements in row i of A for i = 1,...,m. Then it follows that tmp(i + 1) is the number of elements in column i of B. Hence, the cumulative sum of tmp is the same as pcolB. Now suppose that tmp has been overwritten by its cumulative sum. Then elemsB and prowB can be populated as follows

    for j = 1,...,n
        for k = pcolA(j),...,pcolA(j + 1) - 1
            prowB(tmp(prowA(k))) = j
            elemsB(tmp(prowA(k))) = elemsA(k)
            tmp(prowA(k)) = tmp(prowA(k)) + 1
        end
    end

The array tmp is used to index into prowB and elemsB when adding a new element and then is updated accordingly. Putting this altogether, we can write a mex file in C++ that implements the transpose algorithm:

#include "mex.h"
#include <vector>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {     
    // check input output
    if (nrhs != 1)
       mexErrMsgTxt("One input argument required");
    if (nlhs > 1)
       mexErrMsgTxt("Too many output arguments");

    // get input sparse matrix A
    if (mxIsEmpty(prhs[0])) { // is A empty?
        plhs[0] = mxCreateSparse(0, 0, 0, mxREAL);
        return;
    }
    if (!mxIsSparse(prhs[0]) || mxIsComplex(prhs[0])) // is A real and sparse?
       mexErrMsgTxt("Input matrix must be real and sparse");
    double* A = mxGetPr(prhs[0]);           // real vector for A
    mwIndex* prowA = mxGetIr(prhs[0]);      // row indices for elements of A
    mwIndex* pcolindexA = mxGetJc(prhs[0]); // index into the columns
    mwSize M = mxGetM(prhs[0]);             // number of rows in A
    mwSize N = mxGetN(prhs[0]);             // number of columns in A

    // allocate memory for A^T
    plhs[0] = mxCreateSparse(N, M, pcolindexA[N], mxREAL);
    double* outAt = mxGetPr(plhs[0]);
    mwIndex* outprowAt = mxGetIr(plhs[0]);
    mwIndex* outpcolindexAt = mxGetJc(plhs[0]);

    // temp[j + 1] stores the number of nonzero elements in row j of A
    std::vector<mwSize> temp(M + 1, 0); 
    for(mwIndex i = 0; i != N; ++i) {
        for(mwIndex j = pcolindexA[i]; j < pcolindexA[i + 1]; ++j)
            ++temp[prowA[j] + 1];
    }
    outpcolindexAt[0] = 0;
    for(mwIndex i = 1; i <= M; ++i) {
        outpcolindexAt[i] = outpcolindexAt[i - 1] + temp[i];
        temp[i] = outpcolindexAt[i];
    }
    for(mwIndex i = 0; i != N; ++i) {
        for(mwIndex j = pcolindexA[i]; j < pcolindexA[i + 1]; ++j) {
            outprowAt[temp[prowA[j]]] = i;
            outAt[temp[prowA[j]]++] = A[j];
        }
    }
}

Comparing this algorithm with Matlab's implementation of transpose, we observe similar execution times. Note that this algorithm can be modified in a straightforward way to eliminate the temp array.

K. Miller
  • 131
  • 5
1

I agree with what Roger Rowland has mentioned in comments. To ground this suggestion you can check some function from BLAS interface, which MATLAB uses for matrix operations. I'm not sure what implementation does it use, but since they use Intel IPP for image handling, I suppose they might as well use Intel MKL to make matrix operations fast.

And here is the documentation for mkl_?cscsv function, which solves a system of linear equations for a sparse matrix in the CSC format. Note the transa input flag, which explicitly defines whether a matrix provided should be treated as transposed or not.

Mikhail
  • 20,685
  • 7
  • 70
  • 146