0

I'm trying to make a program that multiplies two arrays in parallel so that each thread multiplies a row by a column. The problem is that if I put the omp for in the outside for, the thread will execute the entire internal for instead of just executing the task, and if I put the omp for in the inside for, the for from outside will run multiple times on multiple threads because it is in the scope of 'omp parallel'. I want to run only the task in the thread and I do not want the outside for run multiple times.

for (int line = 0; line < n; ++line) {

    for (int column = 0; column < n; ++column) {

       // only that need to run in new thread
        multiply_line_per_column(line, column);

    }

}
Isdeniel
  • 196
  • 10
  • 1
    Why do you think having each thread perform a vector-vector scalar multiplication is better than having threads perform vector-matrix multiplication? Or are you asking how to approach such cases in general? – Hristo Iliev May 11 '20 at 09:09

1 Answers1

0

One of the options is to use collapse clause: https://stackoverflow.com/a/13357158/2485717

You may also rewrite your for loop to avoid being nested:

for (int i = 0; i < n * n; ++i) {
    int line = i % n;
    int column = i / n;
    multiply_line_per_column(line, column);
}

As pointed out by @Hristo Iliev in the comment, there will be considerable additional cost from integer division and modulo operators.

The drawback is more obvious when n is not a power of 2.

simonmysun
  • 458
  • 4
  • 15
  • 2
    `collapse(2)` is exactly the same as rewriting the code as you've shown, just more readable. The drawback here is the use of integer division and modulo operators to reconstruct the original loop variables. Those have no vectorised implementations on x86. Use with care. – Hristo Iliev May 12 '20 at 10:17