What is the correct way to implement thread barrier, and barrier resetting in C?

Question

I tried to implement a simple barrier in my code that looks like this:

void waitOnBarrier(int* barrier, int numberOfThreads) {
    atomicIncrement(barrier); // atomic increment implemented in assembly
    while(*barrier < numberOfThreads);
}

And then there is a barrier usage in the code:

int g_barrier = 0; // a global variable

waitOnBarrier(&g_barrier, someKnownNumberOfThreads);

So far so good, but where should I reset my g_barrier variable back to zero? If I write something like

g_barrier = 0;

right after the waitOnBarrier call, I will have a problem if one of the threads will be released faster than others from the barrier and nullify the g_barrier while all other threads are still performing the loop instructions, so eventually they will get stuck on the barrier forever.

Explanation: waitOnBarrier will compile into something like this (pseudocode):

1: mov rax, numberOfThreads
2: mov rbx, [barrier]
3: cmp rax, rbx
4: jmp(if smaller) to 2

So if we have 2 threads syncing on the barrier, and thread_1 being slow somewhere at instruction 3 or 4, while a faster thread_2 reaches the barrier, passes it and continues to the g_barrier nullification flow. Which means that after thread_1 will reach instruction 2 it will see a zero value at [barrier] and will stuck on the barrier forever!

The question is, how should I nullify the g_barrier, what place for it in the code is "far enough" that I can be sure that by that time all the threads left the barrier? Or is there more correct way to implement a barrier?

Is there a reason why you don't want to use the thread synchronization library supplied with your operating system? — ArjunShankar, Oct 07 '14 at 08:40
Yes, my code runs in specific environment and not conventional OS. — user1483597, Oct 07 '14 at 08:49
And is there a particular reason to be implementing this lock-free? — ArjunShankar, Oct 07 '14 at 08:58
Since C11, C has atomic types as part of the language and compilers start implementing it. If your compiler doesn't, I think it would still be worth to use the concepts that this standard brings. In particular you should use something similar to `atomic_flag` with test-and-set operations to implement busy wait. Don't reinvent the wheel and make your code future proof. — Jens Gustedt, Oct 07 '14 at 09:04

score 3 · Accepted Answer · edited May 23 '17 at 12:12

Barriers are actually quite difficult to implement, the main reason being that new waiters can begin arriving before all the old waiters have had a chance to execute, which precludes any kind of simple count based implementation. My preferred solution is to have the barrier object itself simply point to a "current barrier instance" that exists on the stack of the first thread arriving at the barrier, and which will also be the last thread to leave (since it cannot leave while other threads are still referencing its stack). A very nice sample implementation in terms of pthread primitives (which could be adapted to C11 locking primitives or whatever you have to work with) is included in Michael Burr's answer to my past question on the topic:

https://stackoverflow.com/a/5902671/379897

Yes it looks like a lot of work, but writing a barrier implementation that actually satisfies the contract of a barrier is non-trivial.

score 1 · Answer 2 · answered Mar 18 '15 at 23:11

I came across this question when trying to do something similar, so I thought I'd share my solution, in case someone else finds it useful. It's implemented in pure C++11 (sadly not C11, since the multithreading part of the standard is unsupported as of yet in gcc and msvc).

Basically, you maintain two counters, whose usage is alternated. Below is the implementation and a usage example:

    #include <cstdio>
    #include <thread>
    #include <condition_variable>

    // A barrier class; The barrier is initialized with the number of expected threads to synchronize
    class barrier_t{
        const size_t count;
        size_t counter[2], * currCounter;
        std::mutex mutex;
        std::condition_variable cv;

    public:
        barrier_t(size_t count) : count(count), currCounter(&counter[0]) {
            counter[0] = count;
            counter[1] = 0;
        }
        void wait(){
            std::unique_lock<std::mutex> lock(mutex);
            if (!--*currCounter){
                currCounter += currCounter == counter ? 1 : -1;
                *currCounter = count;
                cv.notify_all();
            }
            else {
                size_t * currCounter_local = currCounter;
                cv.wait(lock, [currCounter_local]{return *currCounter_local == 0; });
            }
        }
    };

    void testBarrier(size_t iters, size_t threadIdx, barrier_t *B){
        for(size_t i = 0; i < iters; i++){
            printf("Hello from thread %i for the %ith time!\n", threadIdx, i);
            B->wait();
        }
    }

    int main(void){
        const size_t threadCnt = 4, iters = 8;
        barrier_t B(threadCnt);
        std::thread t[threadCnt];   
        for(size_t i = 0; i < threadCnt; i++) t[i] = std::thread(testBarrier, iters, i, &B);
        for(size_t i = 0; i < threadCnt; i++) t[i].join();
        return 0;
    }

score 0 · Answer 3 · edited Jan 24 '19 at 18:12

0

Try to implement the Barrier solution that is being explained in this book:

The Little Book of Semaphores

edited Jan 24 '19 at 18:12

raphinesse

19,068
6
39
48

answered Oct 07 '14 at 09:10

Tal.Bary

450
2
16

Raunak Mukhia · Answer 4 · 2014-10-07T10:19:13.233

0

Do not reset your barrier variable back to zero.

When any of the thread is about to exit, atomically decrement the barrier variable by one.

Your barrier looks like you do not want the number of working threads spawned to fall below numberOfThreads.

edited Oct 07 '14 at 10:19

answered Oct 07 '14 at 10:11

Raunak Mukhia

378
1
11

The question is trying to implement a barrier where every thread waits until *all* threads have reached the barrier, then they all proceed. Your atomic decrement idea doesn't help for this kind of barrier, because threads can still miss waking up. – Peter Cordes Jan 24 '19 at 19:41

What is the correct way to implement thread barrier, and barrier resetting in C?

4 Answers4

Linked