OpenMP Do I have race condition or false-sharing '?

Question

I'm trying to write a code for matrix multiplication. As far as I understand OMP and pararel programming this code may suffer from race condition.

#pragma omp parallel
#pragma omp for
    for (int k = 0; k < size; k++){
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                c[i][j] += a[i][k] * b[k][j];
}}}

Do I get rid of it if I put #pragma omp atomic before writing to c matrix or by adding private(i) to 2nd #pragma? Also is it possible to make this code false-sharing free? If yes, how ?

Atomics would fix your code in terms of correctness (i.e. get rid of the race condition), but it will give you worse performance than without any multi-threading as the accesses will be serialized. As `i` is declared inside the parallel region it is already thread-private. — paleonix, Aug 14 '22 at 16:01

Laci · Accepted Answer · 2022-08-14T16:40:46.017

A race condition occurs when 2 or more threads access the same memory location and at least one of them is writing it. Line c[i][j] +=... can cause data race in your code. The solution is to reorder your nested loops (use the order of i,j,k) and you may introduce a temporary variable to calculate the dot product:

#pragma omp parallel for
    for (int i = 0; i < size; i++) {
        for (int j = 0; j < size; j++) {
            double tmp=0; // change its type as needed
            for (int k = 0; k < size; k++){
                tmp += a[i][k] * b[k][j];
            }
            c[i][j] = tmp;  //note that += was used in your original code
        }
    }

Note that your code will be faster if you calculate the transpose of matrix b. For more details read this.

UPDATE:

If you need to maintain the order of loops, there are 2 possibilities (but these solutions may be slower than the serial code):

Use atomic operation (i.e #pragma omp atomic). In this case false sharing also can be a problem.
If your stack is large enough to store the matrix for all threads, a better alternative is to use reduction: #pragma omp parallel for reduction(+:c[:size][:size]) (Another alternative is to do the reduction manually. In this case you can allocate the matrices used for reduction on the heap.)

The problem is that I need to use the kij nesting. The task is to somehow get rid of the race without changing the order of the loops. — Siarczansodu99, Aug 14 '22 at 16:05
Seems like a silly assignment. Reordering loops is totally normal in OMP to geth perfromance. Your professor has a really bad sense of what makes a good exercise. Anway. Make for each `k` a complete matrix. Compute those independently and reduce them together. — Victor Eijkhout, Aug 14 '22 at 21:25

OpenMP Do I have race condition or false-sharing '?

1 Answers1