I am trying to increase the performance of a sparse matrix-vector product using OpenMP in C++. I stored the sparse matrix in COO format, that is, I have a struct of 3 arrays that correspond to each nonzero entry of the sparse matrix. For each index of the struct, I can find the row index, column index and value of the nonzero entry. In addition, I can specify the number of threads to be used by the function by
export OMP_NUM_THREADS=n
where n is the number of threads I would like to use. Currently, my code is as follows
void ompMatvec(const Vector& x, Vector& y) const {
std::string envName = "OMP_NUM_THREADS";
std::string thread_count = getenv(envName);
int thread_count2 = atoi(thread_count.c_str());
Vector y2(y.numRows());
size_type k;
#pragma omp parallel num_threads(thread_count2)
#pragma omp for
for (k = 0; k < arrayData.size(); ++k) {
y(rowIndices[k]) += arrayData[k] * x(colIndices[k]);
}
}
However, when I measure the performance I find that my speed up is not too high. I am comparing the parallelized function above with:
void matvec(const Vector& x, Vector& y) const {
for (size_type k = 0; k < arrayData.size(); ++k) {
y(rowIndices[k]) += arrayData[k] * x(colIndices[k]);
}
}
I would like to mention that I have created a Vector class with the private member function .numRows() which essentially provides the length of the vector. I am also running the code on 4 cores. Is there an implementation change that could increase performance using the OpenMP API? Or is it limited by the number of cores that my program is running on?
Any and all recommendations are greatly appreciated. Thank you!
Update: An attempt to avoid the race condition above:
void ompMatvec(const Vector& x, Vector& y) const {
std::string envName = "OMP_NUM_THREADS";
std::string thread_count = getenv(envName);
int thread_count2 = atoi(thread_count.c_str());
size_type k;
#pragma omp parallel num_threads(thread_count2) \
default(none) private(k)
Vector y2(y.numRows());
#pragma omp for
for (k = 0; k < arrayData.size(); ++k) {
y2(rowIndices[k]) += arrayData[k] * x(colIndices[k]);
}
#pragma omp critical
for(k = 0; k < y.numRows(); ++k){
y(k) += y2(k);
}
}