21

According to C++03, 5.8/2, left-shifting is defined as follows:

The value of E1 << E2 is E1 (interpreted as a bit pattern) left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 multiplied by the quantity 2 raised to the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1 otherwise.

What bothers me here is that unsigned types are explicitly mentioned yet signed types are ignored completely. Compare this to 5.8/3 which defines right-shifting:

The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 divided by the quantity 2 raised to the power E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.

In 5.8/3 both signed and unsigned are explicitly mentioned, even signed holding non-negative and signed holding negative values are mentioned separately.

AFAIK when something is not explicitly defined in C++ Standard the behavior is undefined. I've also seen this question, but it focuses on differences between C and C++ and doesn't seem to have an answer everyone would agree upon.

Is left-shifting a signed integer defined in C++03?

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • I would interpret that as an additional constraint on the numerical value of the result for unsigned numbers, with no similar constraint on the numerical value for signed numbers (because the standard doesn't presume two's-complement). Whereas right-shifting doesn't even fully define the bit pattern of the result for signed values. – John Calsbeek Mar 26 '12 at 07:03
  • 8
    It might be worth mentioning that C++11 has some explicit wording: "Otherwise, if E1 has a signed type and non-negative value, and E1 × 2^^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined." – Michael Burr Mar 26 '12 at 07:16
  • @MichaelBurr: I would see that as an answer, or the seed of one at the very least. Many paragraphs were clarified in C++11 to better indicate their intent. – Matthieu M. Mar 26 '12 at 08:39

2 Answers2

15

5.8/2 says that it interprets it as a bit-pattern, which is only implementation dependent if for some reason your implementation doesn't use 2's complement, or if your compiler second-guesses you (they don't). C++11 is more explicit, but says the same thing.

Signed integers use what's known as 2's complement. Basically if you bit-shift a signed integer by 1, if it's positive and below 2^(bits - 2) it will work as if it were unsigned. If it is above that but positive, you will create a strange negative number that bears no relation to the original. If it is negative to begin with, you'll get possibly a negative, possibly a positive number.

For instance, if we have an 8-bit signed integer representing -1:

11111111 // -1

If we left-shift that we end up with

11111110 // -2

However, let's say we have -120

10001000  // -120

We shall end up with

00010000  // 16

Obviously that is not correct!

Continuing, using number 65:

01000001  // 65

Shifted left, this will become:

10000001  // -127

Which equates to -127.

However, the number 16:

00010000 // 16

Shifted left is

00100000 // 32

As you can see, it "sometimes works, sometimes doesn't" but usually works if your number is below 2^(bits-2) and sometimes but not usually if it is above -(2^(bits-2)). That is, to shift left by 1. To shift left by 2, take another bit off. Etc.

std''OrgnlDave
  • 3,912
  • 1
  • 25
  • 34
  • I'd like to add an addendum that I say the compiler doesn't second-guess you because C++ is still ostensibly compatible with C, and it has been exactly this way with 2's complement negatives for a long time in C. – std''OrgnlDave Mar 26 '12 at 15:44
  • 1
    ...and as of C++20, signed integers are now **defined** *i.e.* **required** to be represented using two's complement. – underscore_d Jan 01 '19 at 22:52
  • 2
    No more negative 0 integers in C++20 then – std''OrgnlDave Jan 03 '19 at 16:00
12

I would like to add that the rules changed in C++11.

In C++11, signed shift left of a negative number is always undefined behavior, even if the underlying machine defines it for values that are in range. It is not implementation-defined, it is undefined. This means that if you do this, the compiler is free to do anything it wants, including delete a bunch of your code unexpectedly. This is in contrast to signed shift right of negative numbers, which is implementation-defined, meaning that its result depends upon the machine type.

Clang's -fsanitize=undefined mode catches attempts to shift left negative numbers.

user541686
  • 205,094
  • 128
  • 528
  • 886
Myria
  • 3,372
  • 1
  • 24
  • 42
  • 3
    Basically, if you left-shift a number, some bits get carried outside the representation. If your data type is signed, you're ok as long as all carried bits are zero. If any carried bits are one it's undefined behavior. And for unsigned types, all carried bits are discarded; their value does not matter. – Ben Voigt Apr 26 '15 at 00:06
  • 1
    As an addendum to @BenVoigt's comment, his statement is correct even when a 1 bit gets shifted from a normal bit into the sign bit of a signed integer--because the 1 bit isn't carried out of the representation, it's carried into the sign bit. In other words, `INT_MAX << 1` is well-defined behavior rather than undefined behavior (it's `-2` on today's two's-complement machines). It's an odd special case of the Standard (expr.shift.2). `INT_MIN << 1`, however, is undefined behavior, because a 1 bit--the sign bit--is shifted out of a signed integer. – Myria Apr 30 '15 at 20:13
  • 1
    One of the major tragedies of C is the failure of standards authors to recognize a normative category of behavior between implementation-defined and undefined, for situations where most implementations could likely offer some useful behavioral guarantees at little or no cost, but some might be unable to do so. For example, if some weird hardware platform jumps to a random address when a bit falls out the end of a left-shift, guaranteeing anything about the behavior of left-shifting a negative integer might require bigger and slower code than would otherwise be necessary (and the Standard... – supercat Jun 23 '15 at 19:27
  • 1
    ...could thus allow the compiler to *document* "An attempt to left-shift a negative number may cause arbitrary and unpredictable consequences") but code which doesn't need to run on such a platform might benefit from being able to use left-shift operations to scale negative values without having to cast between signed and unsigned forms. – supercat Jun 23 '15 at 19:29
  • 2
    @supercat A secondary major tragedy of C is that a consequence of the standards committees declaring such behavior undefined is compiler authors abusing such situations to make optimizations. Like, the compiler assuming that signed integer overflow never occurs. – Myria Jun 27 '15 at 00:34
  • @Myria: That's what makes the first tragedy tragic. If the only platforms where things behaved oddly were those with "unusual" hardware, programs that didn't need to work on such platforms could simply use the normal behavior. With regard to optimization, what's needed is a way for programmers to indicate what they care about and what they don't. I would posit that 99% of programs' requirements can be summarized in two sentences: (1) When given valid input, produce valid output. (2) When given invalid input, don't launch nuclear missiles. In the absence of any way of coding... – supercat Jun 27 '15 at 16:30
  • ...anything between fully-specified behavior and Undefined Behavior, the only way to meet requirement #2 is to write programs in ways that don't give the optimizer many choices. If code needs to compute `x< – supercat Jun 27 '15 at 16:39
  • With regard to overflow, I don't like having languages require that overflows silently wrap, but I also agree with language designers precise overflow trapping is often prohibitively expensive. I would suggest that the right approach would be to allow the programmer to specify overflow-trapping semantics which are tight enough to meet requirements, but loose enough to grant flexibility to the optimizer. What I'd really like to see would be a "__checked" block which would guarantee that if any arithmetic operators within a `__checked` block have overflows that lose information, ... – supercat Jun 27 '15 at 16:53
  • ...execution must at some arbitrary point transfer to an immediately-following `__overflow_handler` block; the optimizer would be allowed (but not required) to roll back any arbitrary combination of side-effects from within the loop provided that if one side-effect happens-before another, the former can't be rollled back without rolling back the latter. Such semantics would be much looser than anything that can be presently expressed in C without invoking UB, but would be sufficient to allow programmers to meet Requirement #2 much more easily than is presently possible. – supercat Jun 27 '15 at 17:08
  • @supercat I'd love to know the rationale for making this undefined behavior rather than implementation defined. I've never used a processor that didn't have instructions that could left-shift signed numbers with well defined behavior at the bit level. – Mark Ransom May 12 '17 at 15:42
  • @MarkRansom: The difference between Implementation-Defined and Undefined is *not* whether quality implementations should say something about behavior when practical, but whether implementations are *required* to document a consistent behavior. If "int" is 16 bits, then given something like "if (int*int2 < long1)", it might sometimes (e.g. depending upon register availability) be faster to do a 16x16->16 multiply, sign-extend the result, and compare that, or sometimes faster to do a 16x16->32 multiply and compare the 32-bit value directly. Making overflow Implementation-Defined would... – supercat May 12 '17 at 16:17
  • ...have required that a compiler document a consistent behavior – supercat May 12 '17 at 18:27
  • @MarkRansom: I would consider it plausible that some sign-magnitude processors might have two ways of performing a left-shift operation, those ways might perform differently on negative numbers, and one might be more or less efficient than the other based upon various factors. If e.g. the cheapest way to perform a variable-length shift would cause the sign bit to get shifted out, but the cheapest way to do a single-bit shift would multiply a value by itself, making the behavior Implementation-Defined would have required a compiler to either use one or the other consistently, or else... – supercat May 12 '17 at 18:31
  • ...treat something like `x=1; y<<=x;` in the same way as a variable-length shift even if the compiler knew that the shift amount would be 1. In practice, I don't there have never been any C99 or C11 implementations that didn't use two's-complement integer representations, so there has never been any good reason for any implementation not to treat a left shift as repeated multiply by 2 in cases where the result would be representable (I don't blame sanitizing builds for trapping such behavior, but those do so only to enforce a rule that never served any purpose). – supercat May 12 '17 at 18:34
  • @supercat right shift already requires two different behaviors depending on whether the content is signed or unsigned, and this is reflected in e.g. the x86 instruction set. Doing the same for left shifts wouldn't be out of line when required. – Mark Ransom May 12 '17 at 18:37
  • @MarkRansom: On a two's-complement system, there is only one kind of left-shift. On a ones'-complement system, there are two (note that sign-extension in ones'-complement extends both left and right, on the principle that 0.11111...111 is equal to 1). Making the behavior Undefined would allow code that doesn't care about which behavior is chosen to be more efficient than code that would always require one or the other. – supercat May 12 '17 at 18:55
  • @MarkRansom: I think the biggest problem with "undefined behavior" is that while many standards for things assign five categories of traits--those that things MUST, MUST NOT, SHOULD, SHOULD NOT, or MAY have, the C Standard makes essentially no effort to distinguish the last three. I suspect that is in some measure political, since statements that implementations SHOULD do something might be seen as implying that those that didn't were inferior. In most cases I've seen where people bristle at modern compilers' treatment of UB, nearly all implementations used to behave a certain way or else... – supercat May 12 '17 at 19:27
  • ...had a good reason for behaving otherwise. A statement that implementations SHOULD behave that way (except when it would be useless or impractical) would justify an expectation that implementations where such behavior was useful and practical would uphold it. – supercat May 12 '17 at 19:30