parallel programming in OpenMP

Question

I have the following piece of code.

for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

where offset is an array of size n containing values in the range [0, m) and cnt is an array of size m initialized to 0. I use OpenMP to parallelize it as follows.

#pragma omp parallel for shared(cnt, offset) private(i)
for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

According to the discussion in this post, if offset[i1] == offset[i2] for i1 != i2, the above piece of code may result in incorrect cnt. What can I do to avoid this?

dreamcrash · Accepted Answer · 2021-04-22T10:48:45.710

This code:

#pragma omp parallel for shared(cnt, offset) private(i)
for (i = 0; i < n; ++i) {
  ++cnt[offset[i]];
}

contains a race-condition during the updates of the array cnt, to solve it you need to guarantee mutual exclusion of those updates. That can be achieved with (for instance) #pragma omp atomic update but as already pointed out in the comments:

However, this resolves just correctness and may be terribly inefficient due to heavy cache contention and synchronization needs (including false sharing). The only solution then is to have each thread its private copy of cnt and reduce these copies at the end.

The alternative solution is to have a private array per thread, and at end of the parallel region you perform the manual reduction of all those arrays into one. An example of such approach can be found here.

Fortunately, with OpenMP 4.5 you can reduce arrays using a dedicate pragma, namely:

#pragma omp parallel for reduction(+:cnt)

You can have look at this example on how to apply that feature.

Worth mentioning that regarding the reduction of arrays versus the atomic approach as kindly point out by @Jérôme Richard:

Note that this is fast only if the array is not huge (the atomic based solution could be faster in this specific case regarding the platform and if the values are not conflicting). So that is m << n. –

As always profiling is the key!; Hence, you should test your code with aforementioned approaches to find out which one is the most efficient.

parallel programming in OpenMP

1 Answers1