Implementing a Multithreaded Fork

Question

I am trying to checkpoint a multithreaded application. For single threaded applications, forking a process as a checkpoint is an efficient technique. However, there is no such thing as a mulithreaded fork. Any idea of how to implement your own mulithreaded fork? Any reference to such work will be greatly appreciated.

You again. Go accept some answers please, as we discussed in some other question. — Vinicius Kamakura, Jul 04 '11 at 22:23

R.. GitHub STOP HELPING ICE · Accepted Answer · 2011-07-04T22:00:46.647

4

There is no portable way to implement a variant of fork that preserves all threads using the interfaces provided by POSIX. On some systems such as Linux, you could implement a highly non-portable, highly fragile version of this either:

using ptrace to trace all threads (to stop them), then making new kernel threads in the child process to duplicate each thread in the parent and assigning them the original stack addresses, instruction pointers, register values, etc. You'd also need to patch up the thread descriptors to know their new kernelspace thread ids, and you'd need to avoid race conditions in this if the thread was in the middle of querying its thread id.
using vfork followed by SIGSTOP to halt the parent process and give yourself a chance to recreate its thread state without things changing under you. This seems possible but sufficiently difficult I'd get a headache trying to go into detail, I think...
(newly added) catch each thread in signal handlers before forking, and save the ucontext_t argument to the signal handler. Then fork and make new kernel threads (using clone), have them signal themselves, then overwrite the ucontext_t the signal handler gets to have the signal handler return back into the context of the original thread you're trying to duplicate. Of course this would all require very clever synchronization...

Alternatively, you could look for a kernel-based "process hibernation" approach to checkpointing that would not be so hackish...

edited Jul 04 '11 at 22:00

answered Jul 04 '11 at 19:02

R.. GitHub STOP HELPING ICE

208,859
35
376
711

Nope. `pthread_atfork` is for a different purpose, and it's actually largely useless due to an error in the reasoning of the people who designed it. (The idea is for the prefork functions to acquire all global locks and the parent/child postfork functions to release them all, but the child attempting to release any locks will give and error or invoke undefined behavior because the new thread in the child is not the owner of any of the locks.) – R.. GitHub STOP HELPING ICE Jul 04 '11 at 19:44
I think the job of a mulithreaded fork will be easier if it is performed only at special points in the code, such as barriers. – MetallicPriest Jul 04 '11 at 21:13
It'd still require hideous nonportable hacks to recreate threads using the same contexts as the old ones... although I just realized a new way you could do it... – R.. GitHub STOP HELPING ICE Jul 04 '11 at 21:56
Can you kindly tell how to broadcast a signal to all threads, with each thread handling it in a signal handler. I've tried to search a solution on the net but was unsuccessful. – MetallicPriest Jul 05 '11 at 11:10
Now I have got the solution for broadcasting signals to all threads. – MetallicPriest Jul 05 '11 at 13:51
I'd be interested in hearing your solution. It's actually a difficult problem to know when you've gotten them all (without inspecting `/proc`) and it's impossible if some threads may have blocked the signal you want to use (which could be the case for any thread created by a "black box" library or running library code). The only time I've successfully used "broadcast signals" is as part of the pthreads implementation where I was able to forbid blocking of the special signal used for broadcasts. – R.. GitHub STOP HELPING ICE Jul 06 '11 at 02:50
Well I'm doing it by using pthread_kill with SIGUSR1 as the argument and defining a signal handler to handle SIGUSR1. The main thread calls pthread_kill for each thread. In this way, It broadcasts the signal to all of the threads. Yes that is true, if some signal is blocked, it wouldn't work, but if you keep a log of the locked condition mutexes for example, then you can unlock those mutexes in the signal handler so that the thread which is blocked can also enter the signal handler. – MetallicPriest Jul 06 '11 at 10:06
OK, when I fork a process with muliple threads, in the address space of the forked process, the created threads would still be there, right? Do you know any way of restarting them in the forked process. At this moment, to keep things simple, assume that the fork was done b/w two barriers, so there was no pending lock or whatsoever. – MetallicPriest Jul 06 '11 at 13:22
From my reading of the standard, there's no guarantee that the threads' stacks will still exist in the child. However they almost surely will continue to exist in real-world implementations. "Restarting" them is difficult however. You've definitely lost the execution context (register values etc.), but you could possibly save this with `setjmp` and use `longjmp` from a new thread to "return into" it. As I said in my answer this is all very fragile, and you should be aware that it may leave the thread state inconsistent, and would certainly leak memory. – R.. GitHub STOP HELPING ICE Jul 07 '11 at 00:12

score 0 · Answer 2 · edited May 23 '17 at 12:27

0

What do you mean by "multithreaded fork"? A function that makes a copy of a multithreaded process, so that the forked process has just as many threads as the old one? A function that makes a new thread which copies the state of the old one?

The latter isn't possible, since the address space is shared. A copy of the current thread's state would be using the current thread's stack, and the new thread and the old thread would fight over the stack.

Implementing a Multithreaded Fork

2 Answers2