I'm writing a code to multiply a sparse matrix with a full matrix.
I've created 2 class: SparseMatrix and Matrix, which store datas as a vector of shared pointers to vectors. In the SparseMatrix case i save item as an object, called SparseMatrixItem with 2 attributes: position and values. In the Matrix case I simply save the value. They can both be row or column based, by the value of a bool attribute.
Now I'm trying to write an efficient version of the standard product between the 2 matrixes. By semplicity in the first implementation I consider only the case in which the first matrix is a row based SparseMatrix and the second is a column based Matrix. I write the code into the class SparseMatrix by overloading the operator *.
I post my implementation:
template <typename scalar>
Matrix<scalar> SparseVectorMatrix<scalar>::operator*(Matrix<scalar> &input2) {
Matrix<scalar> newMatrix(getNumberOfRows(),input2.getNumberOfColumns(),true);
int numberOfRow=newMatrix.getNumberOfRows();
int numberOfColumn=newMatrix.getNumberOfColumns();
for (int i=0; i<numberOfRow; i++) {
vector<SparseMatrixItem<scalar>>& readRow(*horizontalVectorMatrix[i]);
vector<scalar>& writeRow(*newMatrix.internalMatrix[i]);
for (int j=0; j<numeroColonne; j++) {
vector<scalar>& readColumn1(*input2.internalMatrix[j]);
writeRow[j]=fastScalarProduct(readRow, readColumn1);
}
}
}
The strange fact I cannot figure out is that if I change the 2 loop order performance are dramatically faster. I test it with 2 matrix: 6040x4000 and 4000*6040, the first implementation tooks nearly 30 seconds,while the second implementation tooks only 12 seconds. I post it:
template <typename scalar>
Matrix<scalar> SparseVectorMatrix<scalar>::operator*(Matrix<scalar> &input2) {
Matrix<scalar> newMatrix(getNumberOfRows(),input2.getNumberOfColumns(),true);
int numberOfRow=newMatrix.getNumberOfRows();
int numeroColonne=newMatrix.getNumberOfColumns();
for (int j=0; j<numeroColonne; j++) {
vector<scalar>& readColumn(*input2.internalMatrix[j]);
vector<scalar>& writeColumn(*newMatrix.internalMatrix[j]);
for (int i=0; i<numberOfRow; i++) {
vector<SparseMatrixItem<scalar>>& readRow(*matriceOrizzontaleVettori[i]);
writeColumn[i]=fastScalarProduct(readRow, readColumn);
}
}
}
I post also the code of the function fastScalarProduct()
that I use:
template <typename scalar>
scalar SparseVectorMatrix<scalar>::fastScalarProduct
( vector<SparseMatrixItem<scalar>> &vector1
, const vector<scalar> &vector2
) {
int totalSum=0;
int position;
auto sizeVector1=vector1.size();
for (int i=0; i<sizeVector1; i++) {
position=vector1[i].position-1;
if (vector2[position]) {
totalSum+=(vector1[i].value)*vector2[position];
}
}
return totalSum;
}
I try the same product with MATLAB and it takes only 1.5 seconds more or less. I think that there are issues with cache memory, but since I'm new to this kind of problems I cannot figure out the real problem.
I'm also trying to write an efficient full matrix product, and I'm facing the same problems.