11

I was investigating setjmp/longjmp and found out that setjmp saves registers such as instruction pointer, stack pointer etc...

However what I don't get here is that, can't the data in the stack of the thread itself be modified between the call to setjmp and longjmp. In that case, wouldn't longjmp not work as expected.

To make it clear, for example, when longjmp restores the stack pointer, say the data in the memory the stack pointer is pointing now is not the same as was when setjmp was called. Can this happen? And if that happens, aren't we in trouble?

Also what is meant by the statement, "The longjmp() routines may not be called after the routine which called the setjmp() routines returns."

MetallicPriest
  • 29,191
  • 52
  • 200
  • 356
  • 3
    It's very similar to the statement "A local variable may not be used after the routine that allocated the local variable returns." The stack variables that are in scope when you call `setjmp()` must still be in scope when you call `longjmp()`. – bk1e Nov 01 '11 at 16:27

6 Answers6

6

setjmp()/longjmp() are not meant to save the stack, that's what setcontext()/getcontext() are for.

The standard specifies that the value of non-volatile automatic variables defined in the function that calls setjmp() that are changed between the setjmp() and the longjmp() calls are unspecified after a longjmp(). There are also some restrictions on how you call setjmp() for this same reason.

ninjalj
  • 42,493
  • 9
  • 106
  • 148
  • The answer on this question (http://stackoverflow.com/questions/15115480/switching-up-down-the-stack-with-getcontext-setcontext) says `setcontext()/getcontext()` do not save the stack. Which is it and whats the difference between `setcontext()/getcontext()` and `setjmp()/longjmp()`? – apple16 Dec 01 '13 at 22:37
  • 1
    @apple16: `setjmp()/longjmp()` are intended to unwind the stack. `getcontext()/setcontext()/makecontext()/swapcontext()` are intended to switch between corroutine contexts (each with their own stack), some of them created by `makecontext()`. – ninjalj Dec 02 '13 at 02:53
6

The stack pointer marks the division between the "used" and "unused" portions of the stack. When you call setjmp, all current call frames are on the "used" side, and any calls that take place after setjmp, but before the function which called setjmp returns, have their call frames on the "unused" side of the saved stack pointer. Note that calling longjmp after the function which called setjmp has returned invokes undefined behavior, so that case does not need to be considered.

Now, it's possible that local variables in some of the existing call frames are modified after setjmp, either by the calling function or through pointers, and this is one reason why it's necessary to use volatile in many cases...

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Thanks for the answer, but could you explain what is meant by "calling longjmp after the function which called setjmp has returned"? And where should one use volatile in this case. Which variables I mean. – MetallicPriest Nov 01 '11 at 16:19
  • He means calling it from a function "outer" to the setjmp invocation. – ninjalj Nov 01 '11 at 16:23
  • @MetallicPriest: kind of explained in Wikipedia page on continuations: _A more limited kind is the escape continuation that may be used to escape the current context to a surrounding one. Many languages which do not explicitly support continuations support exception handling, which is equivalent to escape continuations and can be used for the same purposes. C's setjmp/longjmp are also equivalent: they can only be used to unwind the stack. Escape continuations can also be used to implement tail-call optimization_ – ninjalj Nov 01 '11 at 16:26
  • @ninjali, you mean for example this... void somefunction() { setjmp(); } int main() { somefunction(); longjmp(...) } is not correct? – MetallicPriest Nov 01 '11 at 16:27
  • @MetallicPriest: yes, the stack for `somefunction()` has already been destroyed, and there is no turning back. Use `setcontext()` for that. – ninjalj Nov 01 '11 at 16:30
  • 1
    `getcontext`/`setcontext` will not work for that either. You would have to create a completely separate stack with `makecontext`... – R.. GitHub STOP HELPING ICE Nov 01 '11 at 16:33
  • @MetallicPriest: Indeed, that is not correct and extremely dangerous. In order for something like that to work, C could not have a linear stack; it would have to have separate allocated blocks on the heap for each call frame, and some sort of garbage collection to determine which ones were still referenced. Since this is not possible in a language with representation of types, basically all programs would just grow O(n) in memory with the number of function calls `n`...which would not be very fun. – R.. GitHub STOP HELPING ICE Nov 01 '11 at 16:35
  • @MetallicPriest: The "UB if calling longjmp after the function which called setjmp has returned" requirement is saying that you can only longjmp up the call tree / call stack, to a parent function which called `setjmp`. (Or the same invocation of the current function.) You can't `longjmp` back into a function that's already returned. So it's a lot like `throw`/`catch` in terms of call structure, where `setjmp` sets up a catch point and `longjmp` is like `throw`. Somewhat different semantics for stack unwinding and local variables, but still only unwinding up the call stack. – Peter Cordes May 31 '23 at 21:28
4

The setjmp/longjmp (hereafter slj) feature in C is ugly, and its behavior may vary between implementations. Nonetheless, given the absence of exceptions, slj is sometimes necessary in C (note that C++ provides exceptions which are in almost every way superior to slj, and that slj interacts badly with many C++ features).

In using slj, one should bear in mind the following, assuming routine Parent() calls routine Setter(), which calls setjmp() and then calls Jumper, which in turn calls longjmp().

  1. Code may legally exit the scope in which a setjmp is performed without a longjmp having been executed; as soon as the scope exits, however, the previously-created jmp_buf must be regarded as invalid. The compiler probably won't do anything to mark it as such, but any attempt to use it may result in unpredictable behavior, likely including a jump to an arbitrary address.
  2. Any local variables in Jumper() will evaporate with the call to longjmp(), rendering their values irrelevant.
  3. Whenever control returns to Parent, via whatever means, Parent's local variables will be as they were when it called Setter, unless such variables had their addresses taken and were changed using such pointers; in any case, setjmp/longjmp will not affect their values in any way. If such variables do not have their addresses taken, it is possible that setjmp() may cache the values of such variables and longjmp() may restore them. In that scenario, however, there would be no way for the variables to change between when they are cached and when they are restored, so the cache/restore would have no visible effect.
  4. The variables in Setter may or may not be cached by setjmp() call. After a longjmp() call, such variables may have the value they had when setjmp() was performed, or the values they had when it called the routine which ultimately called longjmp(), or any combination thereof. In at least some C dialects, such variables may be declared "volatile" to prevent them from being cached.

Although setjmp/longjmp() can sometimes be useful, they can also be very dangerous. There is in most cases no protection errant code causing Undefined Behavior, and in many real-world scenarios, improper usage is likely to cause bad things to happen (unlike some kinds of Undefined Behavior, where the actual outcome may often line up with what the programmer intended).

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 3
    (4) is wrong. Using `volatile` is required by the standard, and in fact this is the **only** effect the `volatile` keyword is required to have by the standard that's testable by strictly-conforming programs. Modifying local variables and accessing them after the `longjmp`, if `volatile` was not used, results in UB. – R.. GitHub STOP HELPING ICE Nov 01 '11 at 16:40
  • I'm also rather confused what you mean by saying "its behavior may vary between implementations". The defined behavior does not vary at all. It's just that there's lots of UB cases to consider, but they're all completely obvious if you understand what you're doing... – R.. GitHub STOP HELPING ICE Nov 01 '11 at 16:41
  • @R: I said "at least some dialects". I don't know if such behavior was specified in the original K&R C standard, or the first ANSI standard, or if it was first specified at some later time, but it is certainly true in "at least some" dialects. As for UB, I didn't know whether changing a local variable in a scope that has performed a setjmp invalidates the jmp_buf that was produced, or if it merely causes the values of modified variables to become Undefined (such that reading before writing would be Undefined Behavior), or what. – supercat Nov 01 '11 at 17:03
  • @R: Perhaps I should have been clearer in saying that, in addition to the possibilities listed for possible variable contents after a longjmp() call, the variables might also contain nasal demons. In practice, I would think the fact that they might contain any combination of old and new data would be sufficient to discourage use even if they couldn't contain nasal demons. As for variations between implementations, I was thinking of things like variables in the scope that calls setjmp(). Some implementations let one get away without "volatile" qualifiers, and users of such implementations... – supercat Nov 01 '11 at 17:07
  • @R: ...might get away with omitting them for years, never realizing that they're relying upon Undefined Behavior. – supercat Nov 01 '11 at 17:07
  • While I agree there are lots of advantages to C++ exceptions, for bare metal, non-disk applications, the exception tables are still huge compared to slj overhead (even in 2023). There was talk of compressing the tables in g++, but I am not sure this happened. Just to say, the Q&A still have some value in present day. – artless noise May 31 '23 at 18:17
1

It is somewhat useless to use setjmp() and longjmp() in data processing situations. So, this would be the case where you might be concerned with automatic variables. The variables can be in a stack slot or a register. If they are in a stack slot, the stack is not restored by popping contexts along the stack. Instead, the stack is immediately rewound and callee saved registers are restored.

The stack space used by the routine is reserved and other routines should not use it. So if the compiler is aware that the variable is stored on the stack before (setjmp()), then it can retrieve it on return. It will just confound analysis, which is based on 'basic blocks' and setjmp()/longjmp() defy that categorization.

I would question why anyone would use setjmp()/longjmp() in this context. A good use might be found in libjpeg and another by Simon Tatham use in co-routines. Ie, the routines are rather single purposed. They either setup a context for a long running operation that might have many exceptional conditions or they are used as a primitive scheduler. Mixing actual data processing with them is a mistake waiting to happen (for all the reason mentioned elsewhere).

Subclause 7.13.1.1 of the C Standard spells out where you can use setjmp(). It is an abnormal function, so treating it like a normal function call is the main issue. Perhaps the language should have gave it a different syntax.


Also what is meant by the statement, "The longjmp() routines may not be called after the routine which called the setjmp() routines returns."

Here is an example in function e(),

jmp_buf buf;  // Another reason to avoid static/globals here.
 
void k(void) {
  /* ... */
  longjmp(buf, 1);
}

void f(void) {
  if (setjmp(buf) == 0) {
    k();
  } else {
    /* longjmp was invoked */
  }
}
 

void e(void) {
  f();
  longjmp(buf, 1);  // ERROR! we will return to f(); popped stack slot.
   // If 'buf' wasn't global, but declared in f(), this would not compile
   // That is a good thing.
}

The code sequence is e() -> f() -> k() -> f() -> e() -> crash. It is somewhat like using a closed file handle. Ie, the jmp_buf is still set to valid looking values, but they point to non-existent stack.

artless noise
  • 21,212
  • 6
  • 68
  • 105
1

In the example below, setjmp / longjump alters the value of i, which lives in main, via a pointer. I is never incremented in the for loop. For extra fun see the entry albert.c , http://www.ioccc.org/years-spoiler.html winner of the 1992 IOCCC. (one of the few times I ever ROTFLed reading a C source ...)

#include <stdio.h>
#include <setjmp.h>

jmp_buf the_state;

void helper(int *p);
int main (void)
{
int i;

for (i =0; i < 10;    ) {
    switch (setjmp (the_state) ) {
    case 0:  helper (&i) ; break;
    case 1:    printf( "Even=\t"); break;
    case 2:    printf( "Odd=\t"); break;
    default: printf( "Oops=\t"); break;
        }
    printf( "I=%d\n", i);
    }

return 0;
}
void helper(int *p)
{
*p += 1;
longjmp(the_state, 1+ *p%2);
}
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • 1
    `i` needs to be `volatile` for this program to be valid. – R.. GitHub STOP HELPING ICE Nov 01 '11 at 16:37
  • Could be. But there is an un noalias-sed pointer to it floating around. Pity that dmr has died... – wildplasser Nov 01 '11 at 16:42
  • 3
    The compiler could, seeing that `i` is not modified in cases 1 and 2, move the `printf` inside the body in these cases and reuse the register-cached value of `i` from the loop test... It would be nice if the compiler treated `setjmp` specially here, but my understanding of the requirement of using `volatile` is that it's there so compilers don't have to be smart with `setjmp` (and because some of the corner cases are sufficiently difficult to work out that a compiler couldn't do it; they probably require solving the halting problem). – R.. GitHub STOP HELPING ICE Nov 01 '11 at 16:46
  • BTW: the program was not intended to supply an answer, but more to evoke a discussion. In either case, it demonstrates that main()s auto variable (I avoid the wording "stack frame" to please the language lawyers) *can* be altered inbetween setjmp and longjmp. The semantics of Volatile are rather ambigious IIRC. – wildplasser Nov 01 '11 at 16:48
0

Also what is meant by the statement, "The longjmp() routines may not be called after the routine which called the setjmp() routines returns."

This is saying that you can only longjmp up the call tree / call stack, to a parent function which called setjmp. (Or within the same invocation of the current function, like a glorified goto, so it doesn't strictly have to be a parent function.)

So it's a lot like try{}catch / throw in terms of call structure, where setjmp sets up a catch point and longjmp is like throw. Somewhat different semantics for stack unwinding and local variables, but still only unwinding up the call stack.

This is why reuse of stack space for something else isn't a problem: the locals (in automatic storage) that were alive when setjmp was called must still be alive, during the same lifetime.

You can't longjmp back into a function that's already returned. Well you can but it's undefined behaviour. Unlike try{}catch/throw where you literally can't since catching is based on nesting of scopes and function calls, so you can't do the broken thing of jumping to the catch{} block if you're not inside its try.

Other answers go into more details about volatile and other things; I posted this since others seemed to be lacking a simple and clear statement about only being able to jump to a setjmp call-site in a scope that hadn't reached the end of its lifetime.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847