0

What I'm Trying To Do

Hi, I have two types of threads the main one and the workers where the workers are equal to the number of cores on the CPU, what I'm trying to do is when the main thread needs to call an update I set a boolean called Updating to true and call condition_variable(cv).notify_all then each thread will do its work and when done it will increment by one an atomic_int called CoresCompleted followed by a cv.notify_all so that the main thread can check if all the work is done then it will wait for the variable Updating to be false so it is sure that all other threads finished and it doesn't update again, once everything is done the main thread sets updating to false and notifies all.

CODE

Main

void UpdateManager::Update() {

    //Prepare Update
    CoresCompleted = 0;
    Updating = true;

    //Notify Update Started
    cv.notify_all();

    //Wait for Update to end
    auto Pre = high_resolution_clock::now();
    cv.wait(lk, [] { return (int)UpdateManager::CoresCompleted >= (int)UpdateManager::ProcessorCount; });
    auto Now = high_resolution_clock::now();
    auto UpdateTime = duration_cast<nanoseconds>(Now - Pre);
    
    //End Update and nofity threads
    Updating = false;
    cv.notify_all();
}

Workers

void CoreGroup::Work() {

    Working = true;
    unique_lock<mutex> lk(UpdateManager::m);

    while (Working) {
        
        //Wait For Update To Start
        UpdateManager::cv.wait(lk, []{ return UpdateManager::Updating; });

        if (!Working)
            return;

        //Do Work
        size_t Size = Groups.size();

        auto Pre = high_resolution_clock::now();

        for (size_t Index = 0; Index < Size; Index++)
            Groups[Index]->Update();

        auto Now = high_resolution_clock::now();
        UpdateTime = duration_cast<nanoseconds>(Now - Pre);
        
        //Increment CoresCompleted And Notify All
        UpdateManager::CoresCompleted++;
        UpdateManager::cv.notify_all();

        //Wait For Update To End
        UpdateManager::cv.wait(lk, []{ return !UpdateManager::Updating; });
    }
}

Problem

Once the workers reach the last wait where they wait for Updating to be false they get stuck and never leave, for some reason the last notify_all in the main thread is not reaching the workers, I tried searching and looked for many examples but I can't figure out why it isn't triggering, maybe I miss understood how the cv and lock works, any ideas why this is happening and how to fix?

Pedro S
  • 3
  • 2
  • You need to lock `UpdateManager::m` to synchronize notifying the event, even if `Update` happens to be atomic. Otherwise the condition can miss the notify event. – François Andrieux Apr 05 '22 at 16:41
  • @FrançoisAndrieux could you please explain what exactly do you mean by locking m? I tried locking before setting the Update but it just freezes the main thread what exactly should I be doing? when searching for examples online I never saw anyone locking m – Pedro S Apr 05 '22 at 17:07
  • 1
    See https://en.cppreference.com/w/cpp/thread/condition_variable : *"Even if the shared variable is atomic, it must be modified under the mutex in order to correctly publish the modification to the waiting thread."* `UpdateManager::m` has to be locked whenever you change `Update`. You can unlock it between changes to allow threads to operate concurrently, but it has to be relocked before making further changes to it. – François Andrieux Apr 05 '22 at 17:10

1 Answers1

0

Here is how your code works: some waiting in Update is over when notified:

cv.wait(lk, [] { return (int)UpdateManager::CoresCompleted >= (int)UpdateManager::ProcessorCount; });

It goes out of waiting and requires the lock on the mutex. Proceed to do its stuff then reaches the end and notifies the other thread that they can continue working with this line:

cv.notify_all();

But it lies, they can't continue working because you hold the lock. Release it and they will proceed working:

void UpdateManager::Update() {
    <...>
    //End Update and nofity threads
    Updating = false;
    lk.unlock();
    cv.notify_all();
}

That probably isn't the only issue in this code but I assume that you lock the mutex before entering the Update method or have some guarantee that it runs before the other one (Work).

ixSci
  • 13,100
  • 5
  • 45
  • 79
  • Thanks for the answer, so I wasn't locking before but now I am, I added an lk.lock() at the start and a unlock where you told, so now it's working but sometimes it freezes with the CoresCompleted being one number bellow the ProcessorCount (15>=16) I'm not sure why this is happening, before using the notify method it never happened so what this means is that one of the threads is not getting the notify, any idea why this could be? – Pedro S Apr 05 '22 at 18:05
  • @PedroS you need to carefully check what you are locking and when. Debug your program to find the problem. What you posted is just a part of it with pretty messy code which frankly I have no desire to untangle, especially not knowing what exactly you changed. Just make sure you lock around shared variables and don't hold lock longer than neccessary. – ixSci Apr 05 '22 at 18:47