Is Shifting more than 32 bits of a uint64_t integer on an x86 machine Undefined Behavior?

Question

Learning the hard way, I tried to left shift a long long and uint64_t to more than 32 bits on an x86 machine resulted 0. I vaguely remember to have read somewhere than on a 32 bit machine shift operators only work on the first 32 bits but cannot recollect the source. I would like to know is if Shifting more than 32 bits of a uint64_t integer on an x86 machine is an Undefined Behavior?

Do you remember by how many bits you exactly tried to shift? — RedX, May 08 '12 at 13:02
@StephenCanon - m/c is short for 'machine', at least it is among most Indian programmers. Edited the question to fix this. — ArjunShankar, May 08 '12 at 13:19
It is surprising to find that m/c is not known as an abbreviation for machine (and I'm not from India). — Jonathan Leffler, May 08 '12 at 13:25
@pmg: both C++11 and C11 (and C99) have `uint64_t`; the behaviour of shift is the same in both languages. The dual tag can stand this time, though there are many questions where the dual tag is not appropriate. — Jonathan Leffler, May 08 '12 at 13:27
I have never heard of "m/c" before. In my branch, MC would most likely stand for microcontroller, or less likely a Motorola/Freescale integrated circuit. Imagine how much easier it would be to work as programmer if nobody used weird acronyms! — Lundin, May 08 '12 at 13:27
You probably used something like `uint64_t x = 1 << 33` and now blaming the compiler (which probably would have warned you, if you ever enabled warnings) — Gunther Piez, May 08 '12 at 13:32
You would get more clarity on what actually happened here if you posted the code that misbehaved for you. — Steve Townsend, May 08 '12 at 13:51
@drhirsch: I think your explanation makes the most sense. It should be an answer... — R.. GitHub STOP HELPING ICE, May 08 '12 at 13:51
@JonathanLeffler: it's similarly shocking to me to discover that anyone uses "mc" to mean anything other than *machine code*. Imagine my confusion on reading this question. — Stephen Canon, May 08 '12 at 14:03

score 26 · Accepted Answer · edited Mar 08 '19 at 22:38

The standard says (6.5.7 in n1570):

3 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undeﬁned.

4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are ﬁlled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undeﬁned.

5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2^E2 . If E1 has a signed type and a negative value, the resulting value is implementation-deﬁned.

Shifting a uint64_t a distance of less than 64 bits is completely defined by the standard.

Since long long must be at least 64 bits, shifting long long values less than 64 bits is defined by the standard for nonnegative values, if the result doesn't overflow.

Note, however, that if you write a literal that fits into 32 bits, e.g. uint64_t s = 1 << 32 as surmised by @drhirsch, you don't actually shift a 64-bit value but a 32-bit one. That is undefined behaviour.

The most common results are a shift by shift_distance % 32 or 0, depending on what the hardware does (and assuming the compiler's compile-time evaluation emulates the hardware semantics, instead of nasal demons.)

Use 1ULL < 63 to make the shift operand unsigned long long before the shift.

+1. This is how it should be. A compliant compiler has to follow the C standard. — ArjunShankar, May 08 '12 at 13:17
@drhirsch pointed out what the likely problem is: something like `uint64_t x = 1 << 33` — bames53, May 08 '12 at 16:05

Jonathan Leffler · Answer 2 · 2012-05-08T13:23:27.333

The C standard requires the shift to work correctly. A particular buggy compiler might have the defect you describe, but that is buggy behaviour.

This is a test program:

#include <stdio.h>
#include <inttypes.h>

int main(void)
{
    uint64_t x = 1;
    for (int i = 0; i < 64; i++)
        printf("%2d: 0x%.16" PRIX64 "\n", i, (x << i));
    return 0;
}

This is the output on an i686 machine running RHEL 5 with GCC 4.1.2, and also on x86/64 machine (also running RHEL 5 and GCC 4.1.2), and on a x86/64 Mac (running Mac OS X 10.7.3 with GCC 4.7.0). Since that's the expected result, I conclude that there is no necessary problem on the 32-bit machine, and that GCC at least has not exhibited any such bug since GCC 4.1.2 (and probably never has exhibited such a bug).

 0: 0x0000000000000001
 1: 0x0000000000000002
 2: 0x0000000000000004
 3: 0x0000000000000008
 4: 0x0000000000000010
 5: 0x0000000000000020
 6: 0x0000000000000040
 7: 0x0000000000000080
 8: 0x0000000000000100
 9: 0x0000000000000200
10: 0x0000000000000400
11: 0x0000000000000800
12: 0x0000000000001000
13: 0x0000000000002000
14: 0x0000000000004000
15: 0x0000000000008000
16: 0x0000000000010000
17: 0x0000000000020000
18: 0x0000000000040000
19: 0x0000000000080000
20: 0x0000000000100000
21: 0x0000000000200000
22: 0x0000000000400000
23: 0x0000000000800000
24: 0x0000000001000000
25: 0x0000000002000000
26: 0x0000000004000000
27: 0x0000000008000000
28: 0x0000000010000000
29: 0x0000000020000000
30: 0x0000000040000000
31: 0x0000000080000000
32: 0x0000000100000000
33: 0x0000000200000000
34: 0x0000000400000000
35: 0x0000000800000000
36: 0x0000001000000000
37: 0x0000002000000000
38: 0x0000004000000000
39: 0x0000008000000000
40: 0x0000010000000000
41: 0x0000020000000000
42: 0x0000040000000000
43: 0x0000080000000000
44: 0x0000100000000000
45: 0x0000200000000000
46: 0x0000400000000000
47: 0x0000800000000000
48: 0x0001000000000000
49: 0x0002000000000000
50: 0x0004000000000000
51: 0x0008000000000000
52: 0x0010000000000000
53: 0x0020000000000000
54: 0x0040000000000000
55: 0x0080000000000000
56: 0x0100000000000000
57: 0x0200000000000000
58: 0x0400000000000000
59: 0x0800000000000000
60: 0x1000000000000000
61: 0x2000000000000000
62: 0x4000000000000000
63: 0x8000000000000000

Good illustration of what is going on. I *personally* don't like the `PRIX64` macro. I prefer `printf("%2d: %#.016lx\n", i, (x << i));` — dturvene, Mar 16 '21 at 21:12

score 4 · Answer 3 · edited May 23 '17 at 11:53

4

Daniel Fischer's answer answers the question about the C language specification. As for what actually happens on an x86 machine when you issue a shift by a variable amount, refer to the Intel Software Developer Manual Volume 2B, p. 4-506:

The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used).

So if you attempt to shift by an amount larger than 31 or 63 (for 32- and 64-bit values respectively), the hardware will only use the bottom 5 or 6 bits of the shift amount. So this code:

uint32_t RightShift(uint32_t value, uint32_t count)
{
    return value >> count;
}

Will result in RightShift(2, 33) == 1 on x86 and x86-64. It's still undefined behavior according to the C standard, but on x86, if the compiler compiles it down to a sar instruction, it will have defined behavior on that architecture. But you should still avoid writing this sort of code that depends on architecture-specific quirks.

edited May 23 '17 at 11:53

Community

1
1

answered May 08 '12 at 17:12

Adam Rosenfield

390,455
97
512
589

But since it's Undefined Behavior, the compiler might have decided that the actual shift could NOT happen, and so, potentially no assembler instruction would even be emitted. So it doesn't really make sense to look further at what the assembler would do. – hmijail Mar 22 '16 at 20:44
A compiler will use `shr` for unsigned right shifts. `sar` (*arithmetic* right shift) would duplicate the MSB, violating the C semantics for cases that aren't UB. – Peter Cordes Mar 08 '19 at 22:40
Compilers know that shifts mask the count to `&31` or `&63`, so will actually optimize `value >> (count&31)` to a single `shr` or `shrx` instruction, because it implements the `&31` as well as the shift. – Peter Cordes Mar 08 '19 at 22:41

Pascal Cuoq · Answer 4 · 2012-05-08T16:54:50.760

2

Shifting by a number comprised between 0 and the predecessor of the width of the type does not cause undefined behavior, but left-shifting a negative number does. Would you be doing that?

On the other hand, right-shifting a negative number is implementation-defined, and most compilers, when right-shifting signed types, propagate the sign bit.

edited May 08 '12 at 16:54

answered May 08 '12 at 13:14

Pascal Cuoq

79,187
7
161
281

Most compilers do logical (insert 0) right shifts on `unsigned` and arithmetical (insert sign bit) right shifts on `signed` variables. At least any compiler I have ever used. – Gunther Piez May 08 '12 at 13:35
Left shifting a negative number isn't undefined behavior; it's implementation defined. In practice, if the processor has an instruction which will sign extend when shifting left, I would expect the compiler to use it; the "implementation-defined" is to support processors which don't have such an instruction. – James Kanze May 08 '12 at 14:28
@JamesKanze C99 6.5.7:4 “otherwise, the behavior is undeﬁned”. If you are looking for a static analyzer that will (optionally) warn you if you left-shift a negative number, see the link in my bio. – Pascal Cuoq May 08 '12 at 16:33
@drhirsch I have clarified that I meant “propagate the sign bit” for signed types only. – Pascal Cuoq May 08 '12 at 16:42
@JamesKanze The `-val-left-shift-negative-alarms` option of the static analyzer I was referring to turns out to be on by default. Still, we needed to make this warning optional because many programmers think left shift of negative numbers is implementation-defined, and as long as compilers agree with them, such warnings are only noise to them. – Pascal Cuoq May 08 '12 at 16:49
@PascalCuoq §5.8/3 of the C++ standard says "[...]If E1 has a signed type and a negative value, the resulting value is implementation-defined." Another difference between C and C++? – James Kanze May 08 '12 at 17:23
@JamesKanze 5.8/3 in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1905.pdf is about `>>`. 5.8/2 is about `<<`. Regardless, you are partly right, the old C++ standard does not clearly say that `(-1)<<1` is undefined. The new C++11 standard explicitly says that it is undefined: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf – Pascal Cuoq May 08 '12 at 18:04
@PascalCuoq That's curious about `(-1)<<1`, since it was always well defined in the past. Well, sort of. It meant shifting the bit pattern. But of course, the bit pattern for `-1` could vary. (Of course, I'd never use signed if I were doing bit manipulations.) – James Kanze May 08 '12 at 18:23
same difference here as between static_cast and reinterpret_cast, if the underlying system use a separate sign bit for negative numbers instead of 2 complement's encoding, left shifting can give surprising results. You can indeed argue that what is left undefined is the underlying bit pattern of numbers. – kriss Dec 18 '12 at 15:45

score 1 · Answer 5 · answered May 08 '12 at 13:20

No it is ok.

ISO 9899:2011 6.5.7 Bitwise shift operators

If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

That isn't the case here, so it is all fine and well-defined.

Is Shifting more than 32 bits of a uint64_t integer on an x86 machine Undefined Behavior?

5 Answers5

Linked