Without using the standard and if same code in multiple threads is allowed, you can branch the flow by a variable:
// in both threads
unique_lock<std::mutex> lck(mtx);
if(var && myId == "A")
{
// stuff that A does
var = false;
}
else if(!var && myId == "B")
{
// stuff that B does
var = true;
}
but this would be slow because there are other cases where id values do not match the variable condition and checking every case makes it even more slower.
C++ has something to help on this:
std::condition_variable
by using condition variable, you can have a different condition per thread automatically triggered to stop waiting:
std::condition_variable cv;
...
std::unique_lock lk(mtx);
cv.wait(lk, []{return your_logic();});
Since it just waits, it does not waste CPU cycles like the first example. Unnecessary waking-up/processing gets lower and memory bandwidth is not wasted either.
More implicit way of combining outputs from 2 threads would be using two thread-safe queues, one from A to B, one from B to output:
// assuming the implementation blocks .front() until it is filled
ThreadSafeQueue q1;
ThreadSafeQueue q2;
// in thread A
for(int i=0;i<10;i++)
q1.push("A");
// in thread B
for(int i=0;i<10;i++)
{
q2.push(q1.front()+"B");
q1.pop();
}
// in main thread
auto result = q2.front(); // "AB"
q2.pop();
With this pattern, thread-B would only work once for each result of thread-A. But this doesn't synchronize the threads. The thread-A could fill queue with 10 "A" values before thread-B processes the 5th "AB" and before main thread gets the 3rd "AB".
To enforce flip-flop-like work in time, you can limit the size of the queues to 1 or 2. Then it would block thread-A until thread-B consumes it and second queue would block thread-B until main thread consumes it.
Yet another way of synchronizing multiple threads for different tasks would be using cyclic-barriers:
[C++20]
std::barrier sync_point(size /*2?*/, func_on_completion);
// in thread A
..stuff..flip..
sync_point.arrive_and_wait();
..more stuff that needs updated stuff..
// in thread B
..stuff..flop..
sync_point.arrive_and_wait();
..more stuff that needs updated stuff..
the barrier makes sure both threads wait each other before continuing. If this is in a loop, then they will process one step (1 step means both A and B produced at the same time here) at a time while waiting each other before going next iteration. So it will produced ABBAABABBABAAB while never doing more A or more B than the other. If A is always required before B, then you need more barriers to ensure order:
// in thread A and B
if(thread is A)
output "A"
sync_point.arrive_and_wait();
if(thread is B)
output "B"
sync_point.arrive_and_wait();
this prints ABABABAB...
If you are using OpenMP, it has barrier too:
#pragma omp parallel
{
...work...
#pragma omp barrier
...more work...
}
if you don't want second part happened same time as first part of next iteration, you need two barriers:
for(...)
#pragma omp parallel
{
...work...
#pragma omp barrier
...more work...
#pragma omp barrier
}
if order of two threads' work in each iteration is still important, this would require dedicated segments to each thread
for(...)
#pragma omp parallel
{
if(thread is A?)
do this
#pragma omp barrier
if(thread is B?)
do that
#pragma omp barrier
}
this would write ABABAB always, although with decreased efficiency because OpenMP block start/stop overhead is high and measurable in a loop. It would be better to have a loop in each thread instead:
#pragma omp parallel num_threads(2)
{
// this loop runs same for both threads, not shared/parallelized
for(int i=0;i<10;i++)
{
int id=omp_get_thread_num();
if(id==0)
std::cout<<"A"<<std::endl;
#pragma omp barrier
if(id==1)
std::cout<<"B"<<std::endl;
#pragma omp barrier
}
}
this outputs ABABABABAB... and has no openmp start/stop overhead (but still barrier overhead exists).