0

I'm trying to write a program which uses c++11 threads functionality in order to spawn multiple threads, the main thread must wait for each spawned thread to be finished, and all spawned threads must run in parallel. I've come up with the following approach:

#include <iostream>
#include <stdio.h>
#include <thread>
#include <condition_variable>
#include <mutex>

using namespace std;

class Producer
{
public:
    Producer(int a_id):
        m_id(a_id),
        m_running(false),
        m_ready(false),
        m_terminate(false)
    {
        m_id = a_id;
        m_thread = thread(&Producer::run, this);
        while (!m_ready) {}
    }

    ~Producer() {
        terminate();
        m_thread.join();
    }

    void wait() {
        unique_lock<mutex> lock(m_waitForRunFinishMutex);
        m_cond.wait(lock);
        // avoid spurious wake up
        if (m_running) {
            wait();
        }
        lock.unlock();
        cout << "wait exit " << m_id << endl;
    }

    void start() {
        m_running = true;
        m_cond.notify_all();
    }

    void terminate() {
        start();
        m_terminate = true;
    }

    void run() {
        m_ready = true;
        do {
            unique_lock<mutex> lock(m_mutex);
            while (!m_running) {
                m_cond.wait(lock);
            }

            if (!m_terminate) {
                cout << "running thread: " << m_id << endl;
            }

            m_running = false;
            m_cond.notify_all();
        } while (!m_terminate);
    }

private:
    int m_id;
    bool m_running;
    bool m_ready;
    bool m_terminate;
    thread m_thread;
    mutex m_mutex;
    mutex m_waitForRunFinishMutex;
    condition_variable m_cond;
};

The program runs fine when testing with just one thread, i.e the following program:

int main()
{
    Producer producer1(1);    
    producer1.start();
    producer1.wait();

    return 0;
}

Results in the following output:

running thread: 1
wait exit: 1

However if I test the program with 2 thread, e.g:

int main()
{
    Producer producer1(1);
    Producer producer2(2);

    producer1.start();
    producer2.start();

    producer1.wait();
    producer2.wait();

    return 0;
}

I get the following output:

running thread: 2
running thread: 1
wait exit 1

It seems producer2 never get notified (in producer2.wait()), and therefore the program never finishes. Hopefully somebody can point out what I'm missing here.

Thanks everybody for the help in addressing the problem. Eventually the root cause of the problem is described in point (3) of the accepted answer. I've solved this by correcting the wait function as follows:

void wait() {
    unique_lock<mutex> lock(m_waitForRunFinishMutex);
    while (m_running) {
        m_cond.wait(lock);
    }
    lock.unlock();
}
Gio
  • 3,242
  • 1
  • 25
  • 53
  • 4
    Shouldn't `wait()` produce a deadlock? Because it is called recursive and `unique_lock lock(m_waitForRunFinishMutex)` is called while `m_waitForRunFinishMutex` is still locked. – Detonar Jan 02 '18 at 10:40
  • A recursive call returns from a method, that means m_waitForRunFinishMutex is unlocked (i.e. because it go's out of scope) upon recursive call. – Gio Jan 02 '18 at 10:55
  • @Gio you're wrong with the first part; Recursive functions get added to the stack with no unwinding. – UKMonkey Jan 02 '18 at 10:56
  • Recursive calls really cause a leave of scope? That's weird behavior. But thanks. – Detonar Jan 02 '18 at 10:58
  • Yes, the variable leaves scope, i.e. the function returns. See here: https://stackoverflow.com/questions/5852237/question-on-c-recursion-and-local-variables, this means the m_waitForRunFinishMutex is unlocked upon recursive call. – Gio Jan 02 '18 at 11:09
  • Why do you reinventing the wheel and not use standard way of checking predicate? `m_cond.wait(lock, [&](){ return not m_running;});` – Revolver_Ocelot Jan 02 '18 at 11:19
  • Could you please explain further? I don't mean to what you refer with the "standard way of checking predicate". – Gio Jan 02 '18 at 11:20
  • Edited previous comment. Accidentally pressed enter before adding code. – Revolver_Ocelot Jan 02 '18 at 11:21
  • This doesn't seem to be the standard way on http://www.cplusplus.com/reference/condition_variable/condition_variable/?kw=condition_variable, also I don't see how this helps answering my question. – Gio Jan 02 '18 at 11:26
  • http://en.cppreference.com/w/cpp/thread/condition_variable/wait http://www.cplusplus.com/reference/condition_variable/condition_variable/wait/ Second overload in both pages. – Revolver_Ocelot Jan 02 '18 at 12:14
  • Note that in the new code you don't need to call `lock.unlock()`; the destructor will unlock it. The destructor will run when `lock` goes out of scope, **and** when the code inside the lock throws an exception. – Pete Becker Jan 02 '18 at 14:43

1 Answers1

1

Here's a quick collection of issues from a glance.

  1. wait() is recursive without unlocking its unique lock (as per the comment from Detonar)

  2. while (!m_ready) {} Is not in a memory barrier (try compiling with some optimization and see what happens!)

  3. If the worker thread completes before wait() is called; there is no check performed before waiting on the condition variable. Since the worker thread is complete; it will never get woken. Clearly you must check to see if the thread can get woken up within the mutex before waiting on the condition variable.

UKMonkey
  • 6,941
  • 3
  • 21
  • 30
  • what is wrong with point 3? condition variables must be called in cycles in order to avoid spurious wake up right? e.g. https://stackoverflow.com/questions/15072479/stdcondition-variable-spurious-blocking – Gio Jan 02 '18 at 10:50
  • If you're getting spurious wake ups, then your logic is wrong; by definition of `spurious`. If however you're getting wake ups say, because you're waking after every single task complete, rather than just on thread complete; then your comment should reflect why waking up on every task was your intention. – UKMonkey Jan 02 '18 at 10:55
  • I'm not experiencing any spurious wake ups, however an OS may wake-up threads hence it is good to protect against spurious wake-ups, see https://en.wikipedia.org/wiki/Spurious_wakeup – Gio Jan 02 '18 at 10:57
  • Regarding point (2), I do not see what is wrong here. The only use of variable m_ready is ensuring that upon function call start(), the thread actually exists, c++ does not provide any method for that hence it is good checking for that. m_ready is not modified in different threads hence I don't see any use of protecting it. – Gio Jan 02 '18 at 11:13
  • Regarding (1), I also do not see what is wrong here, m_waitForRunFinishMutex is unlocked upon recursive call. Also I've just added a trace to check if this might be the problem, this is not the case, there is no recursive call of `wait` occuring. – Gio Jan 02 '18 at 11:14
  • "m_waitForRunFinishMutex is unlocked upon recursive call" no, it's not. – UKMonkey Jan 02 '18 at 11:27
  • Ok your correct in that, I' ve just check the standard which says, "an unique_lock object does not manage the lifetime of the mutex object", I've added lock.unlock() to the end of the function (I've added that code now in the question). The problem stays exactly the same of course, as already mentioned I'm not experiencing any spurious wake-ups. Hence unfortunately (although I really appreciate the help) I have to draw the conclusion that none of your point address the actual problem I'm experiencing. – Gio Jan 02 '18 at 11:32
  • Yes I guess my comment and your edit just crossed each other, point 3 could indeed be addressing the root of the problem, I'm now testing / looking in to it. – Gio Jan 02 '18 at 11:34
  • @UKMonkey it's Unix/Linux. Those OS are faulty. Aparrently, 'spurious wakeups' are a thing...and they are accepted. For some reason 'sometimes dispatching a thread that does not need CPU onto a core' is more OK than 'occasionally opens the wrong file' or 'sometimes loads the wrong page'. – Martin James Jan 02 '18 at 11:36
  • Eureka!! Addressing point 3 solved the problem, I will add the solution at the bottom of my question. – Gio Jan 02 '18 at 11:39
  • @MartinJames yeah, after reading up I accepted it was a (stupid) thing, but it seems like the standard was written with the implementation in mind, rather than what it should actually promise. – UKMonkey Jan 02 '18 at 11:53