How to run two loops from two threads one by one, like a flip flop?

Question

I have a question which similarly answered here but it is not exactly what I need. I have two threads, each has a loop. Now I want to force two threads to work like a flip flop. exactly like this ABABAB or BABABA... it is not important who start first but must work one by one.

There is a very easy code I have, but it does not work well because the thread A iterates super fact and takes the lock again. Please help me as I am learning C++ multi-threading.

1- in the above link it is said that maybe it's not best approach to have two threads. Assume it is a game and I must run one iteration for player A and one iteration for Player B. I agree it does not give me much better efficiency because at each moment only one of them is working, I want to learn if there is a way.

        int pointA , pointB;
        void testA()
        {
            int i = 0;
            while (i < 10)
            {
                unique_lock<std::mutex> lck(mtx);
                cout << pointB << endl;
                pointA++;
                i++;
            }
        
        }
        
        void main()
        {
            int i = 0;
            pointA =100, pointB=0;
            thread t(testA);
            while (i < 10)
            {
                unique_lock<std::mutex> lck(mtx);
                cout << pointA << endl;
                pointB++;
                i++;
            }
    
        t.join();
    }

Do you know what condition variables are, and how to use them? Which textbook are you using to learn C++ multithreading? These basic principles should be well explained in your textbook. — Sam Varshavchik, Jan 21 '23 at 22:57
@SamVarshavchik yes, actually I was thinking and working with conditional variable too, couldn't manage to use it here for my problem. — user9137963, Jan 21 '23 at 22:59
Ok, so before this practice problem was given for you to try, in your textbook, the underlying principles, and the correct approach should've been explained. What, exactly, was unclear, can you include a brief quote of what your question is about? Stackoverflow is not a tutorial site, and is not a replacement for a textbook, we only answer questions, can you clarify what, specifically, in your textbook's explanation of this algorithm was unclear? — Sam Varshavchik, Jan 21 '23 at 23:02
It is very simple: each thread should wait for "its" value on conditional variable and after current thread done its job, switch contents of variable to other thread's value. Try it and ask any specific question if you will have any. — sklott, Jan 21 '23 at 23:05
@SamVarshavchik I did not copy paste this code form any book. this is a simplified problem of the project I am developing. I followed some online course; it told me some general principles. thx — user9137963, Jan 21 '23 at 23:06
You're in luck. Stackoverflow maintains its own [list of recommended C++ textbooks](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list). — Sam Varshavchik, Jan 21 '23 at 23:16
Two semaphores, one unit, pass the unit around between the threads. Simples. — Martin James, Jan 23 '23 at 16:06

huseyin tugrul buyukisik · Answer 1 · 2023-01-22T10:38:44.330

Without using the standard and if same code in multiple threads is allowed, you can branch the flow by a variable:

// in both threads
unique_lock<std::mutex> lck(mtx);
if(var && myId == "A")
{
    // stuff that A does
    var = false;
}
else if(!var && myId == "B")
{
    // stuff that B does
    var = true;
}

but this would be slow because there are other cases where id values do not match the variable condition and checking every case makes it even more slower.

C++ has something to help on this:

std::condition_variable

by using condition variable, you can have a different condition per thread automatically triggered to stop waiting:

std::condition_variable cv;
...
std::unique_lock lk(mtx);
cv.wait(lk, []{return your_logic();});

Since it just waits, it does not waste CPU cycles like the first example. Unnecessary waking-up/processing gets lower and memory bandwidth is not wasted either.

More implicit way of combining outputs from 2 threads would be using two thread-safe queues, one from A to B, one from B to output:

// assuming the implementation blocks .front() until it is filled
ThreadSafeQueue q1;
ThreadSafeQueue q2;

// in thread A
for(int i=0;i<10;i++)
    q1.push("A"); 

// in thread B
for(int i=0;i<10;i++)
{
    q2.push(q1.front()+"B");
    q1.pop();
}

// in main thread
auto result = q2.front();  // "AB"
q2.pop();

With this pattern, thread-B would only work once for each result of thread-A. But this doesn't synchronize the threads. The thread-A could fill queue with 10 "A" values before thread-B processes the 5th "AB" and before main thread gets the 3rd "AB".

To enforce flip-flop-like work in time, you can limit the size of the queues to 1 or 2. Then it would block thread-A until thread-B consumes it and second queue would block thread-B until main thread consumes it.

Yet another way of synchronizing multiple threads for different tasks would be using cyclic-barriers:

[C++20]
std::barrier sync_point(size /*2?*/, func_on_completion);

// in thread A 
..stuff..flip..
sync_point.arrive_and_wait();
..more stuff that needs updated stuff..

// in thread B
..stuff..flop..
sync_point.arrive_and_wait();
..more stuff that needs updated stuff..

the barrier makes sure both threads wait each other before continuing. If this is in a loop, then they will process one step (1 step means both A and B produced at the same time here) at a time while waiting each other before going next iteration. So it will produced ABBAABABBABAAB while never doing more A or more B than the other. If A is always required before B, then you need more barriers to ensure order:

// in thread A and B
if(thread is A) 
    output "A"
sync_point.arrive_and_wait();
if(thread is B) 
    output "B"
sync_point.arrive_and_wait();

this prints ABABABAB...

If you are using OpenMP, it has barrier too:

#pragma omp parallel
{
     ...work...
     #pragma omp barrier
     ...more work...
}

if you don't want second part happened same time as first part of next iteration, you need two barriers:

for(...)
#pragma omp parallel
{
     ...work...
     #pragma omp barrier
     ...more work...
     #pragma omp barrier
}

if order of two threads' work in each iteration is still important, this would require dedicated segments to each thread

for(...)
#pragma omp parallel
{
     if(thread is A?)
         do this
     #pragma omp barrier
     if(thread is B?)
         do that
     #pragma omp barrier
}

this would write ABABAB always, although with decreased efficiency because OpenMP block start/stop overhead is high and measurable in a loop. It would be better to have a loop in each thread instead:

#pragma omp parallel num_threads(2)
{
    // this loop runs same for both threads, not shared/parallelized
    for(int i=0;i<10;i++)
    {
        int id=omp_get_thread_num();
        if(id==0)
            std::cout<<"A"<<std::endl;
        #pragma omp barrier
        if(id==1)
            std::cout<<"B"<<std::endl;
        #pragma omp barrier
    }
}

this outputs ABABABABAB... and has no openmp start/stop overhead (but still barrier overhead exists).

I think barrier variant doesn't work per OP requirements, it can happilly produce something like "ABBAABBAAB"... — sklott, Jan 22 '23 at 09:11
With two barriers, it can, if same variable is read&written on both sides. I mean, not the order (he says order not important who starts first) of work but the order of chunk AB. — huseyin tugrul buyukisik, Jan 22 '23 at 09:12
If it is just printing, then it would need of course more barriers with each section dedicated to a different thread. — huseyin tugrul buyukisik, Jan 22 '23 at 09:16
@huseyintugrulbuyukisik Note that you cannot use a barrier in a work-sharing loop construct. You can use (sequentially consistent or acquire-release) atomic operations in OpenMP to communicate between threads or alternatively the `ordered` clause. — Laci, Jan 22 '23 at 10:20
@Laci you are right, moving the loop into the parallel block (so its not divided between threads) makes it work. — huseyin tugrul buyukisik, Jan 22 '23 at 10:40

score 0 · Accepted Answer · answered Jan 22 '23 at 18:07

based on this answer and answer above, I managed to write the code. We need one flag to switch the turn between two loops. There is also another way with ready-go approach explained well here, it is in C# but concepts are same:

#include <iostream>
#include <thread>
#include <mutex>

using namespace std;

mutex mutex1;
condition_variable cv3;
char turn;
void ThreadA()
{
    for (int i = 0; i < 1000; i++)
    {
        unique_lock<mutex> lock(mutex1);
        cv3.wait(lock, [] {return (turn == 'A'); });
        cout << "Thread A" << endl;
        turn = 'B';
        cv3.notify_all();
    }
}

void ThreadB()
{
    for (int i = 0; i < 1000; i++)
    {
        unique_lock<mutex> lock(mutex1);
        cv3.wait(lock, [] {return (turn == 'B'); });
        cout << "Thread B" << endl;
        turn = 'A';
        cv3.notify_all();
    }
}

void ExecuteThreads()
{
    turn = 'A';
    std::thread t1(ThreadA);
    std::thread t2(ThreadB);

    t1.join();
    t2.join();

    std::cout << "Finished" << std::endl;
}

int main()
{
    ExecuteThreads();

    return 0;
}

How to run two loops from two threads one by one, like a flip flop?

2 Answers2