24

After invoking longjmp(), non-volatile-qualified local objects should not be accessed if their values could have changed since the invocation of setjmp(). Their value in this case is considered indeterminate, and accessing them is undefined behavior.

Now my question is why volatile works in this situation? Wouldn't change in that volatile variable still fail the longjmp? For example, how longjmp will work correctly in the example given below? When the code get backs to setjmp after longjmp, wouldn't the value of local_var be 2 instead of 1?

void some_function()
{
  volatile int local_var = 1;

  setjmp( buf );
  local_var = 2;
  longjmp( buf, 1 );
}
MetallicPriest
  • 29,191
  • 52
  • 200
  • 356

3 Answers3

26

setjmp and longjmp clobber registers. If a variable is stored in a register, its value gets lost after a longjmp.

Conversely, if it's declared as volatile, then every time it gets written to, it gets stored back to memory, and every time it gets read from, it gets read back from memory every time. This hurts performance, because the compiler has to do more memory accesses instead of using a register, but it makes the variable's usage safe in the face of longjmping.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • 2
    Is there any way I can reduce the performance overhead due to volatile, while still have accurate execution? How much volatile would impact the optimization? – MetallicPriest Nov 03 '11 at 15:09
  • 7
    Well you could add some extra code to only make it volatile across the call to `setjmp`. Something like: `int x; /* do stuff */ volatile int save_x = x; if(setjmp(buf)) { x = save_x; /* do stuff */ }`. This maximizes performance before and after `setjmp` by using non-volatile variables, but ensures safety by using a volatile variable across the call. – Adam Rosenfield Nov 03 '11 at 15:14
  • Cool man, that is indeed a nice trick! But I can see that compiler wouldn't try to use registers for arrays, so in case of arrays volatile would not make such difference, right? – MetallicPriest Nov 03 '11 at 15:23
12

The crux is in the optimization in this scenario: The optimizer would naturally expect that a call to a function like setjmp() does not change any local variables, and optimize away read accesses to the variable. Example:

int foo;
foo = 5;
if ( setjmp(buf) != 2 ) {
   if ( foo != 5 ) { optimize_me(); longjmp(buf, 2); }
   foo = 6;
   longjmp( buf, 1 );
   return 1;
}
return 0;

An optimizer can optimize away the optimize_me line because foo has been written in line 2, does not need to be read in line 4 and can be assumed to be 5. Additionally, the assignment in line 5 can be removed because foo would be never read again if longjmp was a normal C functon. However, setjmp() and longjmp() disturb the code flow in a way the optimizer cannot account for, breaking this scheme. The correct result of this code would be a termination; with the line optimized away, we have an endless loop.

thiton
  • 35,651
  • 4
  • 70
  • 100
  • 1
    Actually modern C compilers *do* need to know that setjmp is a special case, since there are, in general, optimizations where the change of flow caused by setjmp could badly corrupt things, and these need to be avoided. Back in K&R days, setjmp did not need special handling, and didn't get any, and so the caveat about locals applied. Since that caveat is already there and (should be!) understood - and of course, setjmp use is pretty rare - there is no incentive for modern compilers to go to any extra lengths to fix the 'clobber' issue -- it would still be in the language. – greggo Nov 04 '16 at 15:40
8

The most common reason for problems in the absence of a 'volatile' qualifier is that compilers will often place local variables into registers. These registers will almost certainly be used for other things between the setjmp and longjmp. The most practical way to ensure that the use of these registers for other purposes won't cause the variables to hold the wrong values after the longjmp is to cache the values of those registers in the jmp_buf. This works, but has the side effect that there is no way for the compiler to update the contents of the jmp_buf to reflect changes made to the variables after the registers are cached.

If that were the only problem, the result of accessing local variables not declared volatile would be indeterminate, but not Undefined Behavior. There's a problem even with memory variables, though, which thiton alludes to: even if a local variable happens to be allocated on the stack, a compiler would be free to overwrite that variable with something else any time it determines that its value is no longer needed. For example, a compiler could identify that some variables are never 'live' when a routine calls other routines, place those variables shallowest in its stack frame, and pop them before calling other routines. In such a scenario, even though the variables existed in memory when setjmp() is called, that memory might have been reused for something else like holding return address. As such, after the longjmp() is performed, the memory would be considered uninitialized.

Adding a 'volatile' qualifier to a variable's definition causes storage to be reserved exclusively for the use of that variable, for as long as it is within scope. No matter what happens between the setjmp and longjmp, provided control has not left the scope where the variable was declared, nothing is allowed to use that location for any other purpose.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • How much would this impact the performance. Can't we do something like tell the compiler to use a register for a variable as it pleases but flush it to a memory area representing that variable at some point, so that when we call longjmp, the variable value is there. – MetallicPriest Nov 03 '11 at 15:17
  • @MetallicPriest: Fair question. There are at least two distinctly different levels of volatile semantics one might need, one of which could have a significantly higher performance penalty than the other. My guess would be that compilers would use the definition that allows the fewest optimizations; if efficiency in using those variables matters, I would suggest declaring both volatile and non-volatile variables, and copying the volatile one to the non-volatile one after the setjmp, and only writing to the volatile one in cases where one cares about the longjmp() seeing it. – supercat Nov 03 '11 at 15:40
  • Say if we push all the registers in a stack before calling each function, would the problem still remain? I mean in that case, even if the local variables are accessed through registers, they would be pushed in the stack and later popped. isn't it? And secondly, say even if don't use push them, setjmp would have the registers saved them in a buffer anyway, so on returning why wouldn't we have the right value of the variables? I mean it should be the value which was when setjmp was called. isn't it? – MetallicPriest Nov 03 '11 at 16:20
  • @MetallicPriest: Normally, longjmp() will be called from within a function that is in turn called by the function which performed the setjmp(). In some cases, however, it could be called by an asynchronous signal handler. In such a scenario, if there isn't an up-to-date copy of the variable kept on the stack, there would likely be no way for longjmp() to determine what the correct value of the variable should be. If a routine that does a setjmp() passes the jmp_buf to some other routine, the compiler would have no way of knowing whether that routine might create a signal handler. – supercat Nov 03 '11 at 16:27
  • The performance impact is likely to be pretty minor compared to the cost of calling setjmp - let alone the cost of calling another function which calls longjmp. – greggo Nov 04 '16 at 15:22
  • It should be noted that modern C compilers need to understand that setjmp is a very special case, and avoid any optimizations that are broken by the change of flow in the function. In K&R days, when call conventions were simpler, and whole-function optimization was not done, setjmp could be just another library function (which happened to be coded in assembler) -- but not any more ('alloca' is another example of a function that compilers need to treat specially). – greggo Nov 04 '16 at 15:28
  • @greggo: IMHO, alloca() would have been much better if it had been paired with an `freea()` macro that needed to be called in LIFO order and needed to be told the size of object to be freed (failing to properly pair alloca//freea calls would be UB). Had it been defined in that way, not only would it have supported more usage patterns, but it would have been supportable on *all* C implementations, rather than only those which use frame pointers. As a slight tweak, there could have been a second variation of freea which a compiler could treat as optional *if*... – supercat Nov 05 '16 at 21:48
  • ...it would be more efficient to clean up all alloca() objects created within a funtion when it returns than to do so individually [implementations that can't automatically free objects when a function returns would simply treat the second form as equivalent to the first, and any implementation could--if nothing else--map all forms to malloc/free, at some loss in efficiency]. – supercat Nov 05 '16 at 21:52
  • @supercat Well, the zero-cost 'free' is supposed to be the big payoff. It occurred to me that one could portably implement alloc2( nbytes, &ctx) where 'ctx' is a `void* volatile ctx=NULL` declared in the func, which is used to determine stack depth (via its address). The memory would be allocated, not on the stack, but in another array which goes up and down with the stack - so when a call to alloc2 detects a drop in stack, things are freed to that point before the allocation is done. You would sometimes miss a case where something should be freed, but only in the short term. Still quirky. – greggo Nov 07 '16 at 15:33
  • @greggo: If automatic cleanup at function exit weren't guaranteed, the cost of `freea(ptr, size)` would--on most implementations that could support the present `alloca` semantics, be a simple addition instruction which in many cases compilers could optimize out. Even on compilers that can't support the present semantics, the common-case cost for `alloca` would be an addition and a comparison, and the common-case cost for `freea` would be one comparison and a subtraction. Still much cheaper than malloc/free. – supercat Nov 07 '16 at 19:25
  • @greggo: Further, malloc/free can fragment memory badly when used with the pattern "allocate space for oversized temporary object; read data while figuring its exact size and layout, allocate space for exact-size object, copy data to it, and release the temporary object". An alloca/freea approach using the heap could retain ownership of space it had used once, so that temporary objects would keep getting the same allocation while the persistent objects got allocated sequentially. – supercat Nov 07 '16 at 19:28
  • imho, this answer explains the doubt op asked very clearly, better than other answers. – Akash Sep 17 '17 at 16:29