does signed integers now behave differently, with regards to left shift?

Question

In c++20, signed integers are now defined to use two's complement,
see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html

This is a welcome change, however one of the bullet-points caught my eye:

Change Left-shift on signed integer types produces the same results as left-shift on the corresponding unsigned integer type.

This seem like a strange change. Will this not shift away the sign bit?

Nicol Bolas · Accepted Answer · 2020-03-15T15:41:54.067

The C++17 wording for signed left shifts (E1 << E2) was:

Otherwise, if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.

Note that it speaks of being representable in "the corresponding unsigned type". So if you have a 32-bit signed integer whose value is the 0x7FFFFFFF, and you left-shift it by 1, the resulting shift is representable in a 32-bit unsigned integer (0xFFFFFFFE). But then this unsigned value gets converted into the result type. And converting an unsigned integer whose value is too big for the corresponding signed type is implementation-defined.

Overall, in C++17, left-shifting into the sign bit could happen through implementation-defined behavior, and even then only if you don't shift beyond the unsigned result type's size. Going past that is explicitly UB.

The C++20 wording, for both signed and unsigned integers, is:

The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo 2^N, where N is the width of the type of the result.

Integer congruence modulo a number basically means cutting off the bits beyond the modulo number. The "width" of an integer is explicitly defined as:

The range of representable values for a signed integer type is −2^N−1 to 2^N−1−1 (inclusive), where N is called the width of the type.

This means that for a 32-bit signed integer, the width is 31. So the modulous of the result of a shift is 31 bits, which cuts off the sign bit, explicitly preventing shifting into it.

So in C++20, we have a harder guarantee; implementations can never do a signed left-shift into the sign bit. This is different from C++17 only in the sense that implementation variance/UB has been explicitly defined to not happen.

So left shift wasn't defined to shift into the sign bit in C++17, and is defined not to do so in C++20.

What exactly that quote means probably refers to the fact that left shift on a negative number is now valid, shifting is always well-defined no matter how much shifting you do, and the wording for the signed/unsigned shifting is overall the same.

I don't understand your last sentence "Left shift didn't shift into the sign bit, and it still doesn't.". When you left-shift - say - an `int` that has value `INT_MAX` you shift into the sign bit and the result is `-2`. AFAIUI before C++20 this was undefined behavior and now it isn't. — maxschlepzig, Mar 15 '20 at 09:24
@maxschlepzig: I've edited the answer to better explain things, with more spec quotes. It's the same answer, just with more detail. — Nicol Bolas, Mar 15 '20 at 15:36
The width of (typical) `signed` **is** 32; there’s a -1 in the exponents. So `INT_MAX/2+1<<1 == INT_MIN`, since -2*31 is congruent to 2^31 modulo 2^32, and `INT_MIN<<2 == 0`. The logic is that bit operations apply to integers as bit sequences, without regard to the special significance of sign bits, or alternatively that the operation simply takes place under the natural isomorphism between the signed and unsigned integers. — Davis Herring, Mar 15 '20 at 16:35
Ok, I have to correct my last comment: it was implementation defined. And I agree with @DavisHerring's comment, the width of a 32 bit signed integer is 32, i.e. you have to set `N=32`. See also C++17, Section 6.8.1, Paragraph 1 - which you already quote. See also the next Paragraph: 'An unsigned integer type has the same width N as the corresponding signed integer type.' Thus the rest of your answer doesn't hold. — maxschlepzig, Mar 15 '20 at 16:40

score 2 · Answer 2 · answered Mar 16 '20 at 20:22

Yes, the left shifting signed integer behavior changed with C++20.

With C++17, left-shifting a positive signed integer into the sign bit invokes implementation defined behavior.¹ Example:

int i = INT_MAX;
int j = i << 1;    // implementation defined behavior with std < C++20

C++20 changed this to defined behavior because it mandates two's complement representation for signed integers.^2,3

With C++17, shifting a negative signed integer invokes undefined behavior.¹ Example:

int i = -1;
int j = i << 1;    // undefined behavior with std < C++20

In C++20, this changed as well and this operation now also invokes defined behavior.³

This seem like a strange change. Will this not shift away the sign bit?

Yes, a signed left shift shifts away the sign bit. Example:

int i = 1 << (sizeof(int)*8-1);    // C++20: defined behavior, set most significant bit
int j = i << 1;                    // C++20: defined behavior, set to 0

The main reason for specifying something as undefined or implementation defined behavior is to allow for efficient implementations on different hardware.

Nowadays, since all CPUs implement two's complement it's natural that the C++ standard mandates it. And if you mandate two's complement it's only consequential that you make the above operations defined behavior because this is also how left shift behaves in all two's complement instruction set architectures (ISAs).

IOW, leaving it implementation defined and undefined wouldn't buy you anything.

Or, if you liked the previous undefined behavior why would you care if it gets changed to defined behavior? You can still avoid this operation as before. You wouldn't have to change your code.

¹

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2**E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2**E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.

_{(C++17 final working draft, Section 8.8 Shift operators [expr.shift], Paragraph 2, page 132 - emphasis mine)}

²

[..] For each value x of a signed integer type, the value of the corresponding unsigned integer type congruent to x modulo 2 N has the same value of corresponding bits in its value representation. ^{41) This is also known as two’s complement representation.} [..]

_{(C++20 latest working draft, Section 6.8.1 Fundamental types [basic.fundamental], Paragraph 3, page 66)}

³

The value of E1 << E2 is the unique value congruent to E1 × 2**E2 modulo 2**N, where N is the width of the type of the result. [Note: E1 is left-shifted E2 bit positions; vacated bits are zero-filled. — end note]

_{(C++20 latest working draft, Section 7.6.7 Shift operators [expr.shift], Paragraph 2, page 129, link mine)}

See also [my answer to a related question](https://stackoverflow.com/a/60692294/427158). — maxschlepzig, Mar 16 '20 at 20:23

does signed integers now behave differently, with regards to left shift?

2 Answers2