I tried to implement a simple barrier in my code that looks like this:
void waitOnBarrier(int* barrier, int numberOfThreads) {
atomicIncrement(barrier); // atomic increment implemented in assembly
while(*barrier < numberOfThreads);
}
And then there is a barrier usage in the code:
int g_barrier = 0; // a global variable
waitOnBarrier(&g_barrier, someKnownNumberOfThreads);
So far so good, but where should I reset my g_barrier variable back to zero? If I write something like
g_barrier = 0;
right after the waitOnBarrier call, I will have a problem if one of the threads will be released faster than others from the barrier and nullify the g_barrier while all other threads are still performing the loop instructions, so eventually they will get stuck on the barrier forever.
Explanation: waitOnBarrier will compile into something like this (pseudocode):
1: mov rax, numberOfThreads
2: mov rbx, [barrier]
3: cmp rax, rbx
4: jmp(if smaller) to 2
So if we have 2 threads syncing on the barrier, and thread_1 being slow somewhere at instruction 3 or 4, while a faster thread_2 reaches the barrier, passes it and continues to the g_barrier nullification flow. Which means that after thread_1 will reach instruction 2 it will see a zero value at [barrier] and will stuck on the barrier forever!
The question is, how should I nullify the g_barrier, what place for it in the code is "far enough" that I can be sure that by that time all the threads left the barrier? Or is there more correct way to implement a barrier?