28

I know, right shifting a negative signed type depends on the implementation, but what if I perform a left shift? For example:

int i = -1;
i << 1;

Is this well-defined?

I think the standard doesn't say about negative value with signed type

if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.

It only clarifies that if the result isn't representable in signed type then the behavior is undefined.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
user103214
  • 3,478
  • 6
  • 26
  • 37
  • 12
    What part isn't clear? *If* E1 has a non-negative value and ... *otherwise* the behavior is undefined. E1 has negative value, therefore the behavior is undefined. It's true that sometimes the standardese could do with some extra brackets to make it entirely clear what an "otherwise" refers to, but here it means "in any situation not already described". It helps when interpreting these things to remember that in *any* situation where the standard doesn't describe the behavior, then behavior is undefined. So in fact that "otherwise" is formally redundant. – Steve Jessop Dec 07 '11 at 13:21
  • @SteveJessop: I think the __otherwise__ means if the value is not representable then the behavior is undefined. I'm not sure if this includes E1 with negative value. – user103214 Dec 07 '11 at 13:25
  • 1
    @Norman: It does, see R.Marthinho's answer, there is a semicolon, a clear seperator between the condition and the final result. – Xeo Dec 07 '11 at 13:26
  • 1
    Which standard version is that? The wording in C++03 standard ISO/IEC 14882:2003 is different. It says that it must be a bit shift and than only says what is the corresponding numeric value for unsigned type, but does not mention unsigned type at all. Which implies that it's implementation-defined, because the bit pattern is. – Jan Hudec Dec 07 '11 at 13:29
  • @Jan: the wording is from C++11, which took it from C99. You're right that C89 and C++03 both define the left shift operator as "a bit pattern left-shifted E2 positions", without giving any further definition of what it means to "left-shift" a signed bit pattern. I think the general interpretation of that old text was that if the sign bit gets involved at all, then that's an overflow (UB), rather than that the result must be what you'd expect based on the implementation-defined integer representation. But evidently the old text was considered inadequate, or it wouldn't have been changed. – Steve Jessop Dec 07 '11 at 13:36
  • @SteveJessop: Thanks; that needed to be specified. It's somewhat strange, that C++11 (well, n3242; I don't have final) says that right shift of negative values is implementation-defined, but right shift of negative values is undefined. – Jan Hudec Dec 07 '11 at 14:09
  • @Jan: I agree, that is strange. I suppose there must be some dusty old hardware somewhere on which the left shift traps due to overflow, whereas the right shift does something reliable. Or at least a rumour reached the committee's ears that there is. "Dusty old hardware" sometimes turns out to mean, "the machine that runs the bank transaction that pays my salary", so I'm not entirely against the C++ standard taking it into account. – Steve Jessop Dec 07 '11 at 14:18
  • @SteveJessop: Hardware that traps on left-shift could have been handled via "Must either yield the bit pattern... or raise a trap whose existence is implementation-defined", if the intention weren't to allow compilers to make assumptions about code behavior without having to worry about whether such assumptions match reality. – supercat Apr 17 '15 at 23:17
  • Note that with C++20 it's no longer undefined behaviour, see https://stackoverflow.com/questions/55468807 – breathfidgeting Apr 14 '23 at 11:05
  • @SteveJessop: Under C89, if neither a signed type nor unsigned type have any padding bits, the behavior of an unsigned type would unambiguously dictate that it means to "left-shift" a bit pattern by N bits. On implementations where some bit patterns would be invalid, a left-shift that yielded such a pattern would invoke UB, but behavior was defined 100% unambiguously on implementations where no such invalid bit patterns could exist. – supercat Aug 18 '23 at 14:59

3 Answers3

30

You're not reading that sentence correctly. The standard defines it if: the left operand has a signed type and a non-negative value and the result is representable (and previously in the same paragraph defines it for unsigned types). In all other cases (notice the use of the semicolon in that sentence), i.e, if any of these conditions isn't verified, the behaviour is undefined.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
25

When the C standards were codified, different platforms would do different things when left-shifting negative integers. On some of them, the behavior might trigger implementation-specific traps whose behavior could be outside a program's control, and which could include random code execution. Nonetheless, it was possible that programs written for such platforms might make use of such behavior (a program could e.g. specify that a user would have to do something to configure a system's trap handlers before running it, but the program could then exploit the behavior of the suitably-configured trap handlers).

The authors of the C standard did not want to say that compilers for machines where left-shifting of negative numbers would trap must be modified to prevent such trapping (since programs might potentially be relying upon it), but if left-shifting a negative number is allowed to trigger a trap which could cause any arbitrary behavior (including random code execution) that means that left-shifting a negative number is allowed to do anything whatsoever. Hence Undefined Behavior.

In practice, until about 5 years ago, 99+% of compilers written for a machine that used two's-complement math (meaning 99+% of machines made since 1990) would consistently yield the following behaviors for x<<y and x>>y, to the extent that code reliance upon such behavior was considered no more non-portable than code which assumed char was 8 bits. The C standard didn't mandate such behavior, but any compiler author wanting to be compatible with a wide base of existing code would follow it.

  • if y is a signed type, x << y and x >> y are evaluated as though y was cast to unsigned.
  • if x is type int, x<<y is equivalent to (int)((unsigned)x << y).
  • if x is type int and positive, x>>y equivalent to (unsigned)x >> y. If x is of type int and negative, x>>y is equivalent to ~(~((unsigned)x) >> y).
  • If x is of type long, similar rules apply, but with unsigned long rather than unsigned.
  • if x is an N-bit type and y is greater than N-1, then x >> y and x << y may arbitrarily yield zero, or may act as though the right-hand operand was y % N; they may require extra time proportional to y [note that on a 32-bit machine, if y is negative, that could potentially be a long time, though I only know of one machine which would in practice run more than 256 extra steps]. Compilers were not necessarily consistent in their choice, but would always return one of the indicated values with no other side-effects.

Unfortunately for reasons I can't quite fathom, compiler writers have decided that rather than allowing programmers to indicate what assumptions compilers should use for dead-code removal, compilers should assume that it is impossible to execute any shift whose behavior isn't mandated by the C standard. Thus, given code like the following:

uint32_t shiftleft(uint32_t v, uint8_t n)
{
  if (n >= 32)
    v=0;
  return v<<n;
}

a compiler may determine that because the code would engage in Undefined Behavior when n is 32 or larger, the compiler may assume that the if will never return true, and may thus omit the code. Consequently, unless or until someone comes up with a standard for C which restores the classic behaviors and allows programmers to designate what assumptions merit dead code removal, such constructs cannot be recommended for any code that might be fed to a hyper-modern compiler.

Surge
  • 3
  • 3
supercat
  • 77,689
  • 9
  • 166
  • 211
  • 2
    Thanks for this wonderful answer--even if it's not the accepted one it's very useful. – John Zwinck Aug 26 '16 at 04:09
  • Another thing to keep in mind is, not all barrel-shifters are the same. I seem to recall ARM behave a little differently than IA32 and other platforms, and its the reason there's not an implicit mod operation using only the lower bits of the shift amount. – jww Jun 01 '17 at 15:21
  • @jww: Platforms which include a "shift-by-N" instruction often mod-reduce the shift value, but in many cases the mod-reduction is by an amount larger than the word size. Further, different chips in a family might behave differently. Almost all processors, however, would use one of the two behaviors listed. – supercat Jun 01 '17 at 17:57
  • This is a useful answer, but I don't believe that the explanation for the final example of `shiftleft` is correct. If `n >= 32` then this code doesn't engage in undefined behavior, as it will simply return. For the if-statement to be optimized out, the undefined behavior must happen before the if-statement. – gengkev May 29 '18 at 17:45
  • @gengkev: As written, the code does not return zero; it sets `n` equal to zero and then shifts the value zero left by n bits. On many platforms, an "if" with no "else" is more efficient than an "if"/"else" construct. Unless hardware would do something unacceptable with large values of `n` (e.g. running a microcode loop for n cycles with interrupts disabled might be bad if `n` were a large 32-bit value) letting the shift execute may be more efficient than branching around it. If a compiler tries to get "clever" however, it may not be possible to make it generate what should be optimal code. – supercat May 29 '18 at 18:44
  • @jww: The ARM platforms do a mod-256 on the shift amount. If the Standard were to specify that implementations may treat `x<=bitsize` as an unspecified choice among `x<<(y-bitsize)`, `x<<(y-1)<<1`, or whatever hardware happens to do, that should make it practical on all platforms and yet still guarantee that constructs like (x<>(32-y))` would work on all common platforms without having to add additional complexity to accommodate the `y==0` case. – supercat Jul 23 '19 at 17:44
3

otherwise, the behavior is undefined.

This includes the

if E1 has a signed type and non-negative value

Xeo
  • 129,499
  • 52
  • 291
  • 397