6

What I want to is invoking a method foo() with a timeout (say 1 minute). If its execution costs less than 1 minute, return the result. Otherwise an exception will be thrown. Here is the code:

//PRINT "START" IN THE LOG
auto m = std::make_shared<std::mutex>();
auto cv = std::make_shared<std::condition_variable>();
auto ready = std::make_shared<bool>(false);
auto response = std::make_shared<TResponse>();
auto exception = std::make_shared<FooException>();
exception->Code = ErrorCode::None;

std::thread([=]
{
    std::unique_lock<std::mutex> lk(*m);
    cv->wait(lk, [=]{ return *ready; });

    try
    {
        //PRINT "PROCESS" IN THE LOG
        auto r = foo();
        *response = std::move(r);
    }
    catch(const FooException& e)
    {
        *exception = std::move(e);
    }

    lk.unlock();
    cv->notify_one();
}).detach();

std::unique_lock<std::mutex> lk(*m);
*ready = true;
cv->notify_one();
auto status = cv->wait_for(lk, std::chrono::seconds(60));
if (status == std::cv_status::timeout)
{
    //PRINT "TIMEOUT" IN THE LOG
    //throw timeout exception
}
else
{
    //PRINT "FINISH" IN THE LOG
    if (exception->Code == ErrorCode::None)
    {
        return *response;
    }
    else
    {
        throw *exception;
    }
}

You can see I add logs START/PROCESS/FINISH/TIMEOUT in the code, every time this method is executed, I can see START/PROCESS/FINISH or START/PROCESS/TIMEOUT pattern in the logs. However, sometimes the logs are START/PROCESS, without any FINISH/TIMEOUT. I think cv->wait_for should block the current thread for 60 seconds at most, then it exists with either TIMEOUT or FINISH.

The foo() method contains disk IO operations to network drives that sometimes hangs for more than 1 hour(the reason is not related to this question, and it can't be resolved now), I tried to replace foo with a thread sleep, everything is working as expected. What's wrong with this code and how can I improve this?

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
Cheng Chen
  • 42,509
  • 16
  • 113
  • 174

4 Answers4

4

Because you have no predicate in the cv->wait_for call, the thread might be unblocked spuriously. However, it is strange that no FINISH/TIMEOUT is printed. So we might need more information here: What does happen with the program? Does it hang, does it throw, does it just exit, does it print in the line after cv->wait_for?

You could try using std::async and see if the same behavior appears (furthermore, it would greatly simplify your code):

std::future<int> res = std::async(foo);

std::future_status stat = res.wait_for(std::chrono::seconds(60));

if (stat != std::future_status::ready) {
  std::cout << "Timed out..." << "\n";
} else {
  try {
    int result = res.get();
    std::cout << "Result = " << result << std::endl;
  } catch (const FooException& e) {
    std::cerr << e.what() << '\n';
  }
}

EDIT As pointed out in the comments by CuriouslyRecurringThoughts the future of std::async blocks in the destructor. If that is not an option, the following code uses a std::promise and a detached thread instead:

std::promise<int> prom;
std::future<int> res = prom.get_future();

std::thread([p = std::move(prom)]() mutable {
  try {
    p.set_value(foo());
  } catch (const std::exception& e) {
    p.set_exception(std::current_exception());
  }
}).detach();

Waiting for the std::future is done as shown before.

Mike van Dyke
  • 2,724
  • 3
  • 16
  • 31
  • Not an expert in future, so I might be wrong, and if that is the case I apologize, but, shouldn't you use std::launch::deferred ? Because if you use async without the policy, std::launch::async can be chosen, if it is chosen, the destructor of the future will block until completion. See https://stackoverflow.com/questions/23455104/why-is-the-destructor-of-a-future-returned-from-stdasync-blocking – CuriouslyRecurringThoughts Jul 09 '19 at 09:24
  • @CuriouslyRecurringThoughts You're totally right, I forgot about that. However, if `std::launch::deferred` is passed, the future could not time out due to lazy evaluation. Edited my answer. – Mike van Dyke Jul 09 '19 at 10:56
  • In fact I tried promise/future before, they had the same issue as my current solution. The problem is the method `foo` sometimes hangs forever (say you access a network drive and the server has a bug, keeps your disk waiting forever). If `foo` hangs, the thread can't exit "on time", I suppose it should exit when timeout exceeds (and the detached thread will hang there...) – Cheng Chen Jul 10 '19 at 02:49
  • I still don't understand. When you detach the created thread, it means that the main thread does not have to snychronize with it, not even when main exits. So the question that I still have: did you find out what happens with the main thread (it should continue normally after the timeout)? Can you debug it or print something immediately after the timeout? – Mike van Dyke Jul 10 '19 at 07:51
  • @ChengChen have you tried `std::async` for testing purposes? – Mike van Dyke Jul 10 '19 at 09:26
4

It seems that despite the timed wait your main thread deadlocks because even when cv->wait_for returns with timeout it still tries to lk.lock() on the mutex which is currently locked by the second thread.

As mentioned on cppreference about wait_for:

When unblocked, regardless of the reason, lock is reacquired and wait_for() exits.

I'm not sure why the promise/future solution didn't work for you since you didn't post that example here, but I've tried a simple version of it which seems to work even when the second thread "hangs":

using namespace std::chrono_literals;

std::cout << "START" << std::endl;
std::promise<void> p;
auto f = p.get_future();
std::thread t([p = std::move(p)]() mutable {
    std::cout << "PROCESS" << std::endl;
    std::this_thread::sleep_for(5min);
    p.set_value();
});

auto status = f.wait_for(5s);
std::cout << (status == std::future_status::ready ? "FINISH" : "TIMEOUT") << std::endl;
t.join();

The output is as expected:

START
PROCESS
TIMEOUT
r3mus n0x
  • 5,954
  • 1
  • 13
  • 34
  • You're right about the deadlock. Another possible solution would be to release the lock immediately after the `cv->wait`, i.e. before the call to `foo`, in the created thread (if it is not needed to protect data in `foo`, which I guess it doesn't). Do you want to add that to your answer? – Mike van Dyke Jul 11 '19 at 12:41
  • @MikevanDyke, this is exactly the way I checked this but I didn't consider that as a solution because it kind of defeats the purpose of this mutex. – r3mus n0x Jul 11 '19 at 12:44
  • Hmm, which purpose do you mean? To me it seems that the only purpose of the mutex is to correctly use the condition variable and not to synchronize any data, so no need to lock it longer than necessary? – Mike van Dyke Jul 11 '19 at 12:49
  • @MikevanDyke, if the mutex is used to protect the condition variable then how come we are calling `notify_one` **after** releasing it? To me this is more of a design pattern in which we use the mutex to protect the data which we are waiting for using condition variable. Using mutex solely for the purpose of providing it to the condition variable always seemed somewhat redundant and weird to me. – r3mus n0x Jul 11 '19 at 13:03
  • I totally agree with your last sentence, that's what I suspected his intentions to be. Therefore, it would be possible to release the lock immediately after the `cv->wait` and use `notify_one` as a signal as soon as the call to `foo` has finished. But I guess we will have to wait for the OP to tell us what the mutex is for. – Mike van Dyke Jul 11 '19 at 13:31
1

We can create a separate thread to run the call itself, and wait on a condition variable back in your main thread which will be signaled by the thread doing the call to foo once it returns.

The trick is to wait on the condition variable with your 60s timeout, so that if the call takes longer than the timeout you will still wake up, know about it, and be able to throw the exception - all in the main thread.

Please find below a code example:

#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>
#include <condition_variable>

using namespace std::chrono_literals;

int foo()
{

    //std::this_thread::sleep_for(10s); //Will Return  Success
    std::this_thread::sleep_for(70s); //Will Return  Timeout
    return 1;
}

int foo_wrapper()
{
    std::mutex m;
    std::condition_variable cv;
    int retValue;

    std::thread t([&cv, &retValue]() 
    {
        retValue = foo();
        cv.notify_one();
    });

    t.detach();

    {
        std::unique_lock<std::mutex> lock(m);
        if(cv.wait_for(lock, 60s) == std::cv_status::timeout) 
            throw std::runtime_error("Timeout");
    }

    return retValue;    
}

int main()
{
    bool timedout = false;
    try {
        foo_wrapper();
    }
    catch(std::runtime_error& e) {
        std::cout << e.what() << std::endl;
        timedout = true;
    }

    if(!timedout)
        std::cout << "Success" << std::endl;
    else
        std::cout << "Failure" << std::endl;

    return 0;
}

If we use std::this_thread::sleep_for(10s); inside foo will return SUCCESS And, if we use std::this_thread::sleep_for(70s); inside foo will return TIMEOUT

I hope it helps!

Abhishek Sinha
  • 480
  • 4
  • 9
1

As Mike van Dyke says, and the documentation makes quite clear, you need a predicate to use a condition variable correctly, to deal with spurious wakeups:

When the condition variable is notified, a timeout expires, or a spurious wakeup occurs, the thread is awakened, and the mutex is atomically reacquired. The thread should then check the condition and resume waiting if the wake up was spurious.

Any use of a condvar for waiting without a loop and predicate is wrong. It should always have either an explicit while(!predicate) loop or look something like:

std::unique_lock<std::mutex> lk(*m);
auto status = cv->wait_for(lk, std::chrono::seconds(60), predicate);
if (status == std::cv_status::timeout)
{ /*...*/ } else { /*...*/ }

which means you need some predicate to check: setting *ready = false before notifying the condvar in your thread (and using !*ready as your predicate) would be fine.

As for why you didn't see the expected result - I have no idea, because I can't see your real logging code or what happens outside the code snippet you provided. Waking from wait_for without either having timed out or received a valid response or exception is the most likely, but you'll either have to debug your code or provide a complete example to help with that.

Useless
  • 64,155
  • 6
  • 88
  • 132