When the C standards were codified, different platforms would do different things when left-shifting negative integers. On some of them, the behavior might trigger implementation-specific traps whose behavior could be outside a program's control, and which could include random code execution. Nonetheless, it was possible that programs written for such platforms might make use of such behavior (a program could e.g. specify that a user would have to do something to configure a system's trap handlers before running it, but the program could then exploit the behavior of the suitably-configured trap handlers).
The authors of the C standard did not want to say that compilers for machines where left-shifting of negative numbers would trap must be modified to prevent such trapping (since programs might potentially be relying upon it), but if left-shifting a negative number is allowed to trigger a trap which could cause any arbitrary behavior (including random code execution) that means that left-shifting a negative number is allowed to do anything whatsoever. Hence Undefined Behavior.
In practice, until about 5 years ago, 99+% of compilers written for a machine that used two's-complement math (meaning 99+% of machines made since 1990) would consistently yield the following behaviors for x<<y
and x>>y
, to the extent that code reliance upon such behavior was considered no more non-portable than code which assumed char
was 8 bits. The C standard didn't mandate such behavior, but any compiler author wanting to be compatible with a wide base of existing code would follow it.
- if
y
is a signed type, x << y
and x >> y
are evaluated as though y
was cast to unsigned.
- if
x
is type int
, x<<y
is equivalent to (int)((unsigned)x << y)
.
- if
x
is type int
and positive, x>>y
equivalent to (unsigned)x >> y
. If x
is of type int
and negative, x>>y
is equivalent to ~(~((unsigned)x) >> y)
.
- If
x
is of type long
, similar rules apply, but with unsigned long
rather than unsigned
.
- if
x
is an N-bit type and y
is greater than N-1, then x >> y
and x << y
may arbitrarily yield zero, or may act as though the right-hand operand was y % N
; they may require extra time proportional to y
[note that on a 32-bit machine, if y
is negative, that could potentially be a long time, though I only know of one machine which would in practice run more than 256 extra steps]. Compilers were not necessarily consistent in their choice, but would always return one of the indicated values with no other side-effects.
Unfortunately for reasons I can't quite fathom, compiler writers have decided that rather than allowing programmers to indicate what assumptions compilers should use for dead-code removal, compilers should assume that it is impossible to execute any shift whose behavior isn't mandated by the C standard. Thus, given code like the following:
uint32_t shiftleft(uint32_t v, uint8_t n)
{
if (n >= 32)
v=0;
return v<<n;
}
a compiler may determine that because the code would engage in Undefined Behavior when n is 32 or larger, the compiler may assume that the if
will never return true, and may thus omit the code. Consequently, unless or until someone comes up with a standard for C which restores the classic behaviors and allows programmers to designate what assumptions merit dead code removal, such constructs cannot be recommended for any code that might be fed to a hyper-modern compiler.