I'm writing a program that should run both in serial and parallel versions. Once I get it to actually do what it is supposed to do I started trying to parallelize it with OpenMP (compulsory).
The thing is I can't find documentation or references on when to use what #pragma. So I am trying my best at guessing and testing. But testing is not going fine with nested loops.
How would you parallelize a series of nested loops like these:
for(int i = 0; i < 3; ++i){
for(int j = 0; j < HEIGHT; ++j){
for(int k = 0; k < WIDTH; ++k){
switch(i){
case 0:
matrix[j][k].a = matrix[j][k] * someValue1;
break;
case 1:
matrix[j][k].b = matrix[j][k] * someValue2;
break;
case 2:
matrix[j][k].c = matrix[j][k] * someValue3;
break;
}
}
}
}
- HEIGHT and WIDTH are usually the same size in the tests I have to run. Some test examples are 32x32 and 4096x4096.
- matrix is an array of custom structs with attributes a, b and c
- someValue is a double
I know that OpenMP is not always good for nested loops but any help is welcome.
[UPDATE]:
So far I've tried unrolling the loops. It boosts performance but am I adding unnecesary overhead here? Am I reusing threads? I tried getting the id of the threads used in each for but didn't get it right.
#pragma omp parallel
{
#pragma omp for collapse(2)
for (int j = 0; j < HEIGHT; ++j) {
for (int k = 0; k < WIDTH; ++k) {
//my previous code here
}
}
#pragma omp for collapse(2)
for (int j = 0; j < HEIGHT; ++j) {
for (int k = 0; k < WIDTH; ++k) {
//my previous code here
}
}
#pragma omp for collapse(2)
for (int j = 0; j < HEIGHT; ++j) {
for (int k = 0; k < WIDTH; ++k) {
//my previous code here
}
}
}
[UPDATE 2]
Apart from unrolling the loop I have tried parallelizing the outer loop (worst performance boost than unrolling) and collapsing the two inner loops (more or less same performance boost as unrolling). This are the times I am getting.
- Serial: ~130 milliseconds
- Loop unrolling: ~49 ms
- Collapsing two innermost loops: ~55 ms
- Parallel outermost loop: ~83 ms
What do you think is the safest option? I mean, which should be generally the best for most systems, not only my computer?