C++ Bitshift in one line influenced by processor bit width (Bug or Feature?)

Question

I encountered a strange problem, but to make it clear see the code first:

#include <stdio.h>
#include <stdint.h>

int main() {
    uint8_t a = 0b1000'0000; // -> one leftmost bit
    uint8_t b = 0b1000'0000;
    
    a = (a << 1) >> 1; // -> Both shifts in one line
    
    b = b << 1; // -> Shifts separated into two individual lines
    b = b >> 1;
    
    printf("%i != %i", a, b);

    return 0;
}

(using C++ 17 on a x86 machine)

If you compile the code, b is 0 while a is 128. On a general level, this expressions should not be tied to the processors architecture or its bit width, I would expect both to be 0 after the operation

The bitshift right operator is defined to fill up the left bits with zero, as the example with b proves.

If I look at the assembler code, I can see that for b, the value is loaded from RAM into a register, shifted left, written back into RAM, read again from RAM into a register and then shifted write. On every write back into RAM, the truncation to 8 bit integer is done, removing the leftmost 1 from the byte, as it is out of range for an 8-bit integer.

For a on the other hand, the value is loaded in a register (based on the x86 architecture, 32-bit wide), shifted left, then shifted right again, shifting the 1 just back where it was, caused by the 32-bit register.

My question is now, is this one-line optimization for a a correct behavior and should be taken in account while writing code, or is it a compiler bug to be reported to the compiler developers?

Turn on/up your compiler warnings: http://coliru.stacked-crooked.com/a/667e614c4a65f8fe — NathanOliver, Nov 04 '22 at 12:11
Most operands operate on `int` arguments or larger. If you do multiple operations together they are done using `int` and the upper bits can be preserved. If you assign that `int` value to an `uint8_t` between two operations they are lost. — Gerhardh, Nov 04 '22 at 12:14
Arithmetic type promotion. The `uint8_t` is promoted to `int`, shifted left and then right, and the original MSB is not lost. If you store the intermediate value in `uint8_t` the MSB is lost. The reason for type promotion is so that an intermediate valaue *isn't* lost. — Weather Vane, Nov 04 '22 at 12:14
@NathanOliver has the correct answer. Too many people ignore warnings. — Donnie, Nov 04 '22 at 12:17
*"is it a compiler bug"*. This is always the **least** likely reason. — Weather Vane, Nov 04 '22 at 12:22
*...is this one-line optimization for a correct behaviour...* **No**, it's integer promotion, which is intrinsic to the C++ language (and the C language), and if you have warnings enabled your compiler can warn you about the pitfall. *...and should be taken in account while writing code...* **Yes**. *... it a compiler bug...* **No**. — Eljay, Nov 04 '22 at 12:24
If you'd compiled with optimization enabled, you'd see it passing constant values to printf, after having done constant-propagation through your C shifts. That would show compiling each statement to a separate block of asm isn't the *cause* of truncation, it's just the `-O0` way to implement the C semantics. [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) . It's not like `gcc -ffloat-store` where x87 store/reload really does affect values. — Peter Cordes, Nov 04 '22 at 19:46

dbush · Accepted Answer · 2022-11-04T12:36:15.260

What you're seeing is the result of integer promotion. What this means is that (in most cases) anyplace that an expression uses a type smaller than int, that type gets promoted to int.

This is detailed in section 7.6p1 of the C++17 standard:

A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (7.15) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int

So in this expression:

a = (a << 1) >> 1

The value of a on the right side is promoted from the uint8_t value 0x80 to the int value 0x00000080. Shifting left by one gives you 0x00000100, then shifting right again gives you 0x00000080. That value is then truncated to the size of a uint8_t to give you 0x80 when it is assigned back to a.

In this case:

b = b << 1;

The same thing happens to start: 0x80 is promoted to 0x00000080 and the shift gives you 0x00000100. Then this value is truncated to 0x00 before being assigned to b.

So this is not a bug, but expected behavior.

The details of the integer promotions: https://en.cppreference.com/w/c/language/conversion — Eljay, Nov 04 '22 at 12:21
Thank you, i should stick to my habbits to imagine it like function calls. If i write `(a << 1)` it is like calling a function like `int shift(a, 1);`. With this, i know that i have to explicitly cast it after the return value, as `(a << 1)` "returns" int. Yes, its pseudo code, sorry. — timoxd7, Nov 04 '22 at 13:57

score 2 · Answer 2 · answered Nov 04 '22 at 12:21

You have a misconception regarding how these operators work. Almost any operator in C++ (and C) comes with various Implicit type promotion rules regarding its operand(s). In case of the shift operators, then both operands are promoted according to the integer promotions, and the result is of the type of the promoted left operand.

Therefore (a << 1) >> 1; is actually 100% equivalent to ((int)a << 1) >> 1 and the result is of type int, which is signed and therefore deeply problematic, for the following reasons:

Left shifting a negative value invokes undefined behavior
Left shifting data into the sign bit of a signed variable (outside the the specified value range) invokes undefined behavior
Right shifting a negative value invokes implementation-defined behavior: either arithmetic or logical shift can be used.

Good practice is therefore to always cast the left operand of a shift operator to a larger, unsigned integer type. On 32 and 64 bit systems that means casting to uint32_t before shifting.

We don't have to cast the right operand though, since it doesn't affect the result of the shift. That's a special little rule for the shift operators specifically - most binary operators use the types of both operands to determine the resulting type.

(Note: compilers may optimize the code so that it for example only uses 8 bit registers, but the compilers may not optimize out any intended/unintended side effects of implicit type promotion, such as change of size and signedness.)

score 1 · Answer 3 · answered Nov 04 '22 at 12:33

1

Checking on the https://godbolt.org/, you could see exactly on the first operation bit extention is happening.

if you write you oneline:

a = ((uint8_t)(a << 1)) >> 1

both will be fine, 0. Otherwise in first parenthesis when you do shift left it is not uint8_t anymore.

answered Nov 04 '22 at 12:33

amirhm

1,239
9
12

Uh this website is perfect, thank you. I mainly write code for ESP32, having this showing the assambler directly based on code is pretty helpful :) – timoxd7 Nov 04 '22 at 13:53
yes, I love it too, by the way, the name of the website comes from his creator "Jade Kendle Godbolt" another useful information :)) – amirhm Nov 04 '22 at 14:10
@amirhm: The Godbolt compiler explorer was created by Matt Godbolt. His CppCon2017 talk [“What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”](https://youtu.be/bSkpMdDe4g4) shows how to use it. Jade Kendle Godbolt appears to be a social media "influencer", and AFAIK no relation. – Peter Cordes Nov 04 '22 at 20:46
@PeterCordes thanks a lot, yes actually i knew his name was godbolt and just searched his name and surly made mistake. Thanks a lot for the link, i ll make sure to watch that video, again thanks a lot – amirhm Nov 04 '22 at 20:57

C++ Bitshift in one line influenced by processor bit width (Bug or Feature?)

3 Answers3