C++ Matrix product: increased speed with little changes

Question

I'm writing a code to multiply a sparse matrix with a full matrix.

I've created 2 class: SparseMatrix and Matrix, which store datas as a vector of shared pointers to vectors. In the SparseMatrix case i save item as an object, called SparseMatrixItem with 2 attributes: position and values. In the Matrix case I simply save the value. They can both be row or column based, by the value of a bool attribute.

Now I'm trying to write an efficient version of the standard product between the 2 matrixes. By semplicity in the first implementation I consider only the case in which the first matrix is a row based SparseMatrix and the second is a column based Matrix. I write the code into the class SparseMatrix by overloading the operator *.

I post my implementation:

template <typename scalar>
Matrix<scalar> SparseVectorMatrix<scalar>::operator*(Matrix<scalar> &input2) {
    Matrix<scalar> newMatrix(getNumberOfRows(),input2.getNumberOfColumns(),true);
    int numberOfRow=newMatrix.getNumberOfRows();
    int numberOfColumn=newMatrix.getNumberOfColumns();

    for (int i=0; i<numberOfRow; i++) {
    vector<SparseMatrixItem<scalar>>& readRow(*horizontalVectorMatrix[i]);
    vector<scalar>& writeRow(*newMatrix.internalMatrix[i]);

        for (int j=0; j<numeroColonne; j++) {
            vector<scalar>& readColumn1(*input2.internalMatrix[j]);
            writeRow[j]=fastScalarProduct(readRow, readColumn1);

        }
    }
}

The strange fact I cannot figure out is that if I change the 2 loop order performance are dramatically faster. I test it with 2 matrix: 6040x4000 and 4000*6040, the first implementation tooks nearly 30 seconds,while the second implementation tooks only 12 seconds. I post it:

template <typename scalar>
Matrix<scalar> SparseVectorMatrix<scalar>::operator*(Matrix<scalar> &input2) {
    Matrix<scalar> newMatrix(getNumberOfRows(),input2.getNumberOfColumns(),true);
    int numberOfRow=newMatrix.getNumberOfRows();
    int numeroColonne=newMatrix.getNumberOfColumns();

    for (int j=0; j<numeroColonne; j++) {
        vector<scalar>& readColumn(*input2.internalMatrix[j]);
        vector<scalar>& writeColumn(*newMatrix.internalMatrix[j]);

        for (int i=0; i<numberOfRow; i++) {
            vector<SparseMatrixItem<scalar>>& readRow(*matriceOrizzontaleVettori[i]);
            writeColumn[i]=fastScalarProduct(readRow, readColumn);
        }
    }
}

I post also the code of the function fastScalarProduct() that I use:

template <typename scalar>
scalar SparseVectorMatrix<scalar>::fastScalarProduct
    ( vector<SparseMatrixItem<scalar>> &vector1
    , const vector<scalar> &vector2
    ) {
    int totalSum=0;
    int position;
    auto sizeVector1=vector1.size();

    for (int i=0; i<sizeVector1; i++) {
        position=vector1[i].position-1;
        if (vector2[position]) {
            totalSum+=(vector1[i].value)*vector2[position];
        }
    }
    return totalSum;
}

I try the same product with MATLAB and it takes only 1.5 seconds more or less. I think that there are issues with cache memory, but since I'm new to this kind of problems I cannot figure out the real problem.

I'm also trying to write an efficient full matrix product, and I'm facing the same problems.

score 1 · Answer 1 · edited May 23 '17 at 10:29

1

You are right in saying the "issue" is with cache memory. I suggest you read about locality of reference (http://en.wikipedia.org/wiki/Locality_of_reference) which explains why your program runs faster when the loop with the most iterations is inside the one with less iterations. Basically, arrays are linear data sctructures, and they make great use of spatial locality.

As for the time it took to run the algorithm in matlab vs C++, I suggest you read this post: Why is MATLAB so fast in matrix multiplication?

edited May 23 '17 at 10:29

Community

1
1

answered Jan 09 '15 at 21:05

Laurent Huot

11
2

1

You should please cite some of the core statements of the linked wikipedia article. Just linking an off-site source (even be it wikipedia) Isn't very much appreciated on SO. To be clear: I've upvoted your answer, but you should consider improving it. – πάντα ῥεῖ Jan 09 '15 at 21:10
Thank you! Yes, I've read the post on Matlab. Now I read your first link, but the two loops are identical in length: both numberOfColumn and numberOfRow are 6040 in my test. Is kind of reference important? – stefano1 Jan 09 '15 at 21:16

C++ Matrix product: increased speed with little changes

1 Answers1