Multitasking using setjmp, longjmp

Question

is there a way to implement multitasking using setjmp and longjmp functions

[Tony Finch's picoro(small co-routines)](http://dotat.at/cgi/git?p=picoro.git;a=blob;f=picoro.c;hb=HEAD). Co-routines are in Knuth's the art of computing and are co-operative multi-tasking. As well, Simon Tatham has a [co-routines web page](http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html) with nice explanations. — artless noise, Feb 24 '14 at 18:10
Also, care should be taken; the `setjmp()` and `longjmp()` are most often/always implemented in assembler and resemble OS context switch code. However, they may not save some state such as *floating point*, *SIMD state*, etc. Whether this is an implementation bug or a standards issue, I don't know. However, this issue will often exist in practice. Knowing what state to save can be a significant boost to context switch speeds. — artless noise, Feb 25 '14 at 18:58
See: [`setjmp()` and fpmode](http://www-personal.umich.edu/~williams/archive/computation/setjmp-fpmode.html) for more on other CPU state. — artless noise, Feb 25 '14 at 22:19

Sean Ogden · Accepted Answer · 2013-03-22T14:39:09.753

You can indeed. There are a couple of ways to accomplish it. The difficult part is initially getting the jmpbufs which point to other stacks. Longjmp is only defined for jmpbuf arguments which were created by setjmp, so there's no way to do this without either using assembly or exploiting undefined behavior. User level threads are inherently not portable, so portability isn't a strong argument for not doing it really.

step 1 You need a place to store the contexts of different threads, so make a queue of jmpbuf stuctures for however many threads you want.

Step 2 You need to malloc a stack for each of these threads.

Step 3 You need to get some jmpbuf contexts which have stack pointers in the memory locations you just allocated. You could inspect the jmpbuf structure on your machine, find out where it stores the stack pointer. Call setjmp and then modify its contents so that the stack pointer is in one of your allocated stacks. Stacks usually grow down, so you probably want your stack pointer somewhere near the highest memory location. If you write a basic C program and use a debugger to disassemble it, and then find instructions it executes when you return from a function, you can find out what the offset ought to be. For example, with system V calling conventions on x86, you'll see that it pops %ebp (the frame pointer) and then calls ret which pops the return address off the stack. So on entry into a function, it pushes the return address and frame pointer. Each push moves the stack pointer down by 4 bytes, so you want the stack pointer to start at the high address of the allocated region, -8 bytes (as if you just called a function to get there). We will fill the 8 bytes next.

The other thing you can do is write some very small (one line) inline assembly to manipulate the stack pointer, and then call setjmp. This is actually more portable, because in many systems the pointers in a jmpbuf are mangled for security, so you can't easily modify them.

I haven't tried it, but you might be able to avoid the asm by just deliberately overflowing the stack by declaring a very large array and thus moving the stack pointer.

Step 4 You need exiting threads to return the system to some safe state. If you don't do this, and one of the threads returns, it will take the address right above your allocated stack as a return address and jump to some garbage location and likely segfault. So first you need a safe place to return to. Get this by calling setjmp in the main thread and storing the jmpbuf in a globally accessible location. Define a function which takes no arguments and just calls longjmp with the saved global jmpbuf. Get the address of that function and copy it to your allocated stacks where you left room for the return address. You can leave the frame pointer empty. Now, when a thread returns, it will go to that function which calls longjmp, and jump right back into the main thread where you called setjmp, every time.

Step 5 Right after the main thread's setjmp, you want to have some code that determines which thread to jump to next, pulling the appropriate jmpbuf off the queue and calling longjmp to go there. When there are no threads left in that queue, the program is done.

Step 6 Write a context switch function which calls setjmp and stores the current state back on the queue, and then longjmp on another jmpbuf from the queue.

Conclusion That's the basics. As long as threads keep calling context switch, the queue keeps getting repopulated, and different threads run. When a thread returns, if there are any left to run, one is chosen by the main thread, and if none are left, the process terminates. With relatively little code you can have a pretty basic cooperative multitasking setup. There are more things you probably want to do, like implement a cleanup function to free the stack of a dead thread, etc. You can also implement preemption using signals, but that is much more difficult because setjmp doesn't save the floating point register state or the flags registers, which are necessary when the program is interrupted asynchronously.

Some specific implementations of setjmp/longjmp may work in such a fashion that one can jinx them to behave as desired, and it's possible that some compilers might even *specify* that their implementations work in a particular fashion that would allow such a thing without having to rely upon undocumented/undefined behavior when targeting such compilers, but I'd suggest instead using a few lines of assembly code to do the stack/register switches. Using setjmp/longjmp is no more portable than assembly code, but might give an illusion of portability. — supercat, Mar 21 '13 at 18:10
That having been said, I think there's a lot to be said for cooperative multi-tasking. Many compilers expressly document what registers (if any) need to be preserved by external assembly-language modules. A pre-emptive multitasker would have to preserve all registers that a compiler might be using, which could be a problem if e.g. a compiler takes advantage of a hardware multiply-accelleration acceleration unit the multitasker doesn't know about, but cooperative multitaskers avoid such problems. That having been said... — supercat, Mar 21 '13 at 18:15
...things like C++ exceptions, depending upon how they're implemented, may or may not play nice with even cooperative multitasking. One would have to research how exceptions are implemented to know what is required of the stacks maintained by the running threads. — supercat, Mar 21 '13 at 18:17
If you use the asm approach for modifying the stack pointer, than this works on any sane setjmp/longjmp implementation. The only implementation defined part is when you modify the pointers in the jmpbuf yourself, which I mentioned above. The question specifically asks for setjmp, longjmp implmentation, so that's what is here. — Sean Ogden, Mar 22 '13 at 14:44
If you use the stack overflow method to adjust the stack pointer, there's no asm, and the approach works for all machines in which the stack grows down and the stack frames begin with RA and FP (need to use sizeof(int*) to get the appropriate size for the offsets). That covers pretty much all x86/AMD64 machines that use Windows, OSX or Linux. — Sean Ogden, Mar 22 '13 at 15:09
Preemptive multitasking is easily doable. I did an implementation which worked on OSX and Linux, with very minimal ifdefs for portability. Signal context stores all register state, even the floating points (because it must). You just need to write your own asm to restore all of the state. It's literally just a bunch of pop instructions, and fxrstor to restore floating point state. There are very few calling conventions in widespread use so achieving something portable enough is not a hard problem. — Sean Ogden, Mar 22 '13 at 15:13
A lot of C++ implementations use one or more static variables to control exception handling; trying to switch between threads that use exceptions will require that the task-switcher know about such variables and swap them. Swapping them isn't hard--making sure there aren't any static variables that one doesn't know about is the hard part. — supercat, Mar 23 '13 at 21:40
FYI, Seastar uses setjmp/longjmp for context switches https://github.com/scylladb/seastar/blob/master/src/core/thread.cc#L137-L152, and setups initial stack with setcontext hack https://github.com/scylladb/seastar/blob/master/src/core/thread.cc#L234-L247 — Ivan Prisyazhnyy, Jun 02 '23 at 12:56

score 8 · Answer 2 · answered Aug 09 '10 at 05:41

It may be bending the rules a little, but GNU pth does this. It's possible, but you probably shouldn't try it yourself except as an academic proof-of-concept exercise, use the pth implementation if you want to do it seriously and in a remotely portable fashion -- you'll understand why when you read the pth thread creation code.

(Essentially it uses a signal handler to trick the OS into creating a fresh stack, then longjmp's out of there and keeps the stack around. It works, evidently, but it's sketchy as hell.)

In production code, if your OS supports makecontext/swapcontext, use those instead. If it supports CreateFiber/SwitchToFiber, use those instead. And be aware of the disappointing truth that one of the most compelling use of coroutines -- that is, inverting control by yielding out of event handlers called by foreign code -- is unsafe because the calling module has to be reentrant, and you generally can't prove that. This is why fibers still aren't supported in .NET...

The Netscape Portable Runtime (NSPR) also appears to define macros for doing this using a simpler but hairier method: they just call setjmp and then change the machine stack pointer and instruction pointer in the buffer. Google "_MD_INIT_CONTEXT" for an entertaining read. — j_l, Aug 09 '10 at 07:10

score 5 · Answer 3 · answered Nov 05 '11 at 17:08

This is a form of what is known as userspace context switching.

It's possible but error-prone, especially if you use the default implementation of setjmp and longjmp. One problem with these functions is that in many operating systems they'll only save a subset of 64-bit registers, rather than the entire context. This is often not enough, e.g. when dealing with system libraries (my experience here is with a custom implementation for amd64/windows, which worked pretty stable all things considered).

That said, if you're not trying to work with complex external codebases or event handlers, and you know what you're doing, and (especially) if you write your own version in assembler that saves more of the current context (if you're using 32-bit windows or linux this might not be necessary, if you use some versions of BSD I imagine it almost definitely is), and you debug it paying careful attention to the disassembly output, then you may be able to achieve what you want.

score 2 · Answer 4 · answered Mar 13 '21 at 23:21

I did something like this for studies. https://github.com/Kraego/STM32L476_MiniOS/blob/main/Usercode/Concurrency/scheduler.c

The context/thread switching is done by setjmp/longjmp. The difficult part was to get the allocated stack correct (see allocateStack()) this depends on your platform.

This is just a demonstration how this could work, I would never use this in production.

stsp · Answer 5 · 2020-11-23T11:07:56.833

As was already mentioned by Sean Ogden, longjmp() is not good for multitasking, as it can only move the stack upward and can't jump between different stacks. No go with that.

As mentioned by user414736, you can use getcontext/makecontext/swapcontext functions, but the problem with those is that they are not fully in user-space. They actually call the sigprocmask() syscall because they switch the signal mask as part of the context switching. This makes swapcontext() much slower than longjmp(), and you likely don't want the slow co-routines.

To my knowledge there is no POSIX-standard solution to this problem, so I compiled my own from different available sources. You can find the context-manipulating functions extracted from libtask here:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/mcontext
The functions are: getmcontext(), setmcontext(), makemcontext() and swapmcontext(). They have the similar semantic to the standard functions with similar names, but they also mimic the setjmp() semantic in that getmcontext() returns 1 (instead of 0) when jumped to by setmcontext().

On top of that you can use a port of libpcl, the coroutine library:
https://github.com/dosemu2/dosemu2/tree/devel/src/base/lib/libpcl
With this, it is possible to implement the fast cooperative user-space threading. It works on linux, on i386 and x86_64 arches.

Multitasking using setjmp, longjmp

5 Answers5

Linked