Implementing spinlocks to synchronize OpenMP threads using compiler intrinsics

Question

Please take a look at the following code snippet using OpenMP for parallelization:

char lock = 0;
#pragma omp parallel
{
    while(!__sync_bool_compare_and_swap(&lock, 0, 1));
    printf("Thread: %d is working!\n", omp_get_thread_num());
    sleep(1);
    printf("Thread: %d unlocks lock!\n", omp_get_thread_num());
    lock = 0;
}

Is it possible, that threads simultaneously lock the lock, even though the locking is atomic with __sync_bool_compare_and_swap? For example, not all threads having a consistent view of the memory?

Why would you ever want to do that? OpenMP is exactly made to provide proper abstractions, e.g. `#pragma omp crtical` in your case. From the [description of `__sync_bool_compare_and_swap`](https://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc/Atomic-Builtins.html), I would strongly assume that memory barriers are used to enforce a consistent view of memory. But as noticed by EOF, the `lock = 0;` can be a problem due to reordering. — Zulan, Jun 23 '16 at 14:54
While I completely agree with you, I am currently working with code I have not written and in this code, synchronization is reinforced the way I presented it in the code. And Intel Inspector detects race conditions, which should not be possible if the synchronization is working as I expect it to do. — Dominique M., Jun 23 '16 at 15:03
Is it conceivable that the Intel Inspector produces false positives? — Zulan, Jun 23 '16 at 16:04

score 3 · Accepted Answer · edited May 23 '17 at 12:08

If you insist on not doing this the OpenMP way:

You definitely need a compiler barrier to prevent lock=0 from being reordered at compile time.

If you're only targeting x86, just a compiler barrier is fine; Otherwise use a C11 atomic_thread_fence(memory_order_release) before lock=0, which is just a compiler barrier on x86 but will emit the necessary instructions on weakly ordered architectures.

Or make lock an atomic type, and use a C11 stdatomic release-store to set it to 0.

Spinning on a lock cmpxchg to take the lock is pretty inefficient. You should spin on a load until the lock is available, then try to take it. e.g. you should write something that will compile to code like this minimal but real asm spinlock implementation.

e.g. use a C11 atomic load with memory_order_acquire. Then use a normal xchg, not a cmpxchg, and check if you got the lock or if another thread took it before you did.

Implementing spinlocks to synchronize OpenMP threads using compiler intrinsics

1 Answers1