Waiting on threads to finish while quitting the process

Question

There was no direct and satisfactory answer found on quite a simple question:

Given multiple threads running is there a generic/correct way to wait on them to finish while exiting the process? Or "is doing timed wait Ok in this case?"

Yes, we attempt to signal threads to finish but it is observed that during process exit some of them tend to stall. We recently had a discussion and it was decided to rid of "arbitrary wait":

m_thread.quit();          // the way we had threads finished
m_thread.wait(kWaitMs);   // with some significant expiration (~1000ms)

m_thread.quit();          // the way we have threads finished now
m_thread.wait();          // wait forever until finished

I understand that kWaitMs constant should be chosen somewhat proportional to one uninterrupted "job cycle" for the thread to finish. Say, if the thread processes some chunk of data for 10 ms then we should probably wait on it to respond to quit signal for 100 ms and if it still does not quit then we just don't wait anymore. We don't wait in that case as long as we quit the program and no longer care. But some engineers don't understand such "paradigm" and want an ultimate wait. Mind that the program process stuck in memory on the client machine will cause problems on the next program start in our case for sure not to mention that the log will not be properly finished to process as an error.

Can the question about the proper thread finishing on process quit be answered?

Is there some assistance from Qt/APIs to resolve the thread hang-up better, so we can log the reason for it?

P.S. Mind that I am well aware on why it is wrong to terminate the thread forcefully and how can that be done. This question I guess is not about synchronization but about limited determinism of threads that run tons of our and framework and OS code. The OS is not Real Time, right: Windows / MacOS / Linux etc.

P.P.S. All the threads in question have event loop so they should respond to QThread::quit().

So, are you asking how to make sure a thread responds correctly to a "quit" event? Or, are you asking how to determine why a thread did not quit correctly? I'm not sure I understand what you mean by "limited determinism". Do you mean, "Making sure it can't run for a really long time"? — Kyle A, Mar 11 '16 at 23:34
The major question: is doing timed wait Ok in this case? And all the related discussion including diagnostics of why it does not quit. — Alexander V, Mar 12 '16 at 00:16

Jeremy Friesner · Answer 1 · 2016-03-12T01:57:59.340

2

Yes, we attempt to signal threads to finish but it is observed that during process exit some of them tend to stall.

That is your real problem. You need to figure out why some of your threads are stalling, and fix them so that they do not stall and always quit reliably when they are supposed to. (The exact amount of time they take to quit isn't that important, as long as they do quit in a reasonable amount of time, i.e. before the user gets tired of waiting and force-quits the whole application)

If you don't/can't do that, then there is no way to shut down your app reliably, because you can't safely free up any resources that a thread might still be accessing. It is necessary to 100% guarantee that a thread has exited before the main thread calls the destructors of any objects that the thread uses (e.g. the QThread object associated with the thread)

So to sum up: don't bother playing games with wait-timeouts or forcibly-terminating threads; all that will get you is an application that sometimes crashes on shutdown. Use an indefinite-wait, and make sure your threads always (always!) quit after the main thread has asked them to, as that is the only way you'll achieve a reliable shutdown sequence.

edited Mar 12 '16 at 01:57

answered Mar 12 '16 at 01:48

Jeremy Friesner

70,199
15
131
234

Btw when you come across a thread that has stalled during the shutdown sequence, it would be a good idea to use either a debugger or something like the "Sample Process" feature of Apple's Activity monitor find out where the current execution point of the thread is (i.e. the stack trace of the location where it is stuck at). Once you know where the thread is stuck, you'll be well on your way to figuring out why it is stuck, and figuring out how to unstick it. – Jeremy Friesner Mar 12 '16 at 01:52
I cannot not agree with you on finding the reason of such trouble but when it happens "on site" it is a bit too late. Maybe it makes sense to have debug code not watch the "expiration" period while still having an infinite wait for release code. – Alexander V Mar 12 '16 at 04:56
Agreed, "on site" is too late -- which means the problem needs to be reproduced "off site" and fixed before then, if possible. – Jeremy Friesner Mar 12 '16 at 05:11
@JeremyFriesner Sometimes it is not possible to make sure a thread does not stall. For example, you can call an OS API that starts an I/O to a hardware device which fails, and your thread never returns from that API. What to do with stalled thread in such a situation? – Igor Levicki Mar 21 '22 at 10:28
@IgorLevicki I think the only 100% reliable way to avoid the possibility of a call blocking forever is to avoid blocking I/O calls entirely. That’s doable with networking; I’m not sure if it’s possible for file system I/O or local device I/O. Another possibility might be to set a timeout, although that isn’t always supported and might not work in the face of a hardware failure: – Jeremy Friesner Mar 21 '22 at 13:59
@IgorLevicki I thought of one other "big hammer" solution, if you don't mind sacrificing some efficiency: move the blocking I/O calls into a child process, and have the parent process communicate with the child process via non-blocking APIs. Then if the child process gets stuck, the parent process has the option of unilaterally killing the child process if it wants to. – Jeremy Friesner Mar 21 '22 at 14:15
@JeremyFriesner One example of calls that can get stuck forever are DirectShow calls when working with a capture device. I had situations where device disappears from the Device Manager, yet there are no device notifications or System Event Log entries indicating that the device has failed, not to mention that DirectShow (or Media Foundation) do not have a concept of returning errors from hardware to the caller and don't always support blocking calls with timeouts. – Igor Levicki Mar 21 '22 at 14:56
@JeremyFriesner The idea of a child process is an interesting one though, what non-blocking APIs would you suggest on Windows platform for IPC? – Igor Levicki Mar 21 '22 at 14:57
1

@IgorLevicki I haven't tried it myself, but I understand that named pipes can be used in non-blocking mode under Windows. – Jeremy Friesner Mar 21 '22 at 15:35
@IgorLevicki good old TCP (or UDP) over the loopback device would also work – Jeremy Friesner Mar 21 '22 at 18:38
@JeremyFriesner By non-blocking in this context you mean alertable, right? Or fully asynchronous I/O? – Igor Levicki Mar 24 '22 at 12:12
@IgorLevicki I just meant that calls to `write()/read()/send()/recv()` (or whatever their equivalents are in the particular API being considered) can be made to always return immediately, so that there is no chance of the thread becoming "stuck" inside a blocking I/O call for indefinite periods of time and therefore impossible to control or shut down. – Jeremy Friesner Mar 24 '22 at 16:32

Waiting on threads to finish while quitting the process

1 Answers1