3

We have a code in production that in some situation may left-shift a 32-bit unsigned integer by more than 31 bits. I know this is considered undefined behavior. Unfortunately we can't fix this right now, but we can work this around, if only we can assume how it works in practice.

On x86/amd64 I know processor for shifts uses only the appropriate less-significant bits of the shift count operand. So that a << b is in fact equivalent to a << (b & 31). From the hardware design this makes perfect sense.

My question is: how does this work in practice on modern popular platforms, such as arm, mips, RISC and etc. I mean those that are actually used in modern PCs and mobile devices, not outdated or esoteric.

Can we assume that those behave the same way?

EDIT:

  1. The code I'm talking about currently runs in a blockchain. It's less important how exactly it works, but at the very least we want to be sure that it yields identical results on all the machines. This is the most important, otherwise this can be exploited to induce a so-called chain split.

  2. Fixing this means hassles, because the fix should be applied simultaneously to all the running machines, otherwise we are yet again at risk of the chain split. But we will do this at some point in an organized (controlled) manner.

  3. Lesser problem with the variety of compilers. We only use GCC. I looked at the code with my own eyes, there's a shl instruction there. Frankly I don't expect it to be anything different given the context (shift operand comes from arbitrary source, can't be predicted at compile time).

  4. Please don't remind me that I "can't assume". I know this. My question is 100% practical. As I said, I know that on x86/amd64 the 32-bit shift instruction only takes 5 least significant bits of the bit count operand.

How does this behave on current modern architectures? We can also restrict the question to little-endian processors.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
valdo
  • 12,632
  • 2
  • 37
  • 67
  • How can you work around it if you can't change the code? And if you can change the code, why can't you just check if the shift amount is too large? – Barmar Jun 02 '21 at 18:54
  • 8
    This is not only hardware dependent, but also the compiler and its settings. You can't and should not make assumptions of undefined behavior. – Eugene Sh. Jun 02 '21 at 18:54
  • 1
    I don't think you can reason about this by considering **only** the CPU platform. The compiler might apply any number of optimizations that could be affected by this. (just my 2c) – Jeffrey Jun 02 '21 at 18:54
  • It would help if you clarify whether you ship source or binaries and whether you have just one compiler you could examine the produced assembly or not. – Jeffrey Jun 02 '21 at 18:56
  • 1
    `we can assume how it works in practice` but you can't assume. `On x86/amd64 I know processor for shifts` But how do you know compiler generates this instruction? Is your code in assembly? – KamilCuk Jun 02 '21 at 19:02
  • 2
    Gcc and clang know about the x86 behavior and correspondingly optimize `(uint32_t)a << ((uint32_t)b & 31)` to generate the same code as `(uint32_t)a << (uint32_t)b` and similar for other cases. Based on Compiler Explorer (https://gcc.godbolt.org/z/fsPznz1zK), they are not employing this optimization for ARM and ARM64, so I would assume the hardware behaves differently there. – Petr Skocik Jun 02 '21 at 19:03
  • 1
    Instead of asking what this undefined behaviour might do on different CPUs you should find out what behavior is required for your software to work correct. – Bodo Jun 02 '21 at 19:20
  • Your edit implies this code is for security-sensitive application. If so, I **urge** you to fix it in a proper way. – Eugene Sh. Jun 02 '21 at 19:24
  • 2
    @PSkocik: Clang, and I expect GCC, do not universally generate shift code that is the same as `(uint32_t)a << ((uint32_t)b & 31)`. If the compiler can determine the value of `b` at compile-time, it may generate different code. I just tested Clang, and it generated a `shll` where it could not determine `b` but did nothing where it could determine `b` was out of range. By “nothing,” I mean it did not even move `a` into the destination; it just returned whatever was already in the register used to return a value. – Eric Postpischil Jun 02 '21 at 20:13
  • 1
    fwiw, `unsigned s(unsigned a, unsigned b) { return a << (b & 31); }`; gcc amd64 doesn't generate code for the and; but gcc arm does. – mevets Jun 05 '21 at 14:46
  • @EricPostpischil: For many purposes, the most practical way of specifying the behavior of x<=bitsize would be to say that it chooses in Unspecified fashion between x<<(y-1)<<1 or x<<(y-bitsize). Among other things, it would make allow a rotate-right operation to be written via idiom `(x>>y) | (x << (bitsize - y)` that compilers could easily recognize (if y is zero, the expression could yield either `(x|0)` or `(x|x)`, chosen in Unspecified fashion, but since both would equal `x` such a choice would be harmless). It's far less obvious how programmers should be expected to... – supercat Jun 08 '21 at 15:51
  • ...write such an expression if they must avoid shifting by the bit size. On many platforms it may be expensive to guarantee that x< – supercat Jun 08 '21 at 15:57

3 Answers3

6

It is UB in C. Here you have an example: https://godbolt.org/z/5h9f7W6rr

It does not have any practical use unless you use inline assembly. But if you want how particular platforms handle left shift assembly instructions you need to see their assembly language references:

ARM-Thumb



    If n is 32 or more, then all the bits in the result are cleared to 0.

    If n is 33 or more and the carry flag is updated, it is updated to 0. 

x86 - shift is performed modulo 32.

0___________
  • 60,014
  • 4
  • 34
  • 74
  • "x86 - shift is performed modulo 32." --> For fun: 8086 shifted modulo 256 (thats how long the shift took) the math was more like modulo 16. – chux - Reinstate Monica Jun 02 '21 at 19:19
  • 1
    Thanks. I guess this fully answers my question. As long as ARM is considered, it will yield different result to at of x86. – valdo Jun 02 '21 at 19:20
  • P.S. I've made a change to your example, to better reflect my use-case: https://godbolt.org/z/YdroToq55 . As you can see - now there are fewer differences. – valdo Jun 02 '21 at 19:47
  • For more details on what x86 does, see [Why any modern x86 masks shift count to the 5 low bits in CL](https://stackoverflow.com/a/61779201) - interestingly the count is masked to 5 bits for any operand-size of 32-bit or less, so you *can* shift all the bits out of an 8 or 16-bit operand. – Peter Cordes Sep 23 '22 at 08:51
5

With code that triggers undefined behavior, the compiler can just about do anything - well, that's why it's undefined - asking for a safe definition of undefined code doesn't make any sense. Theoretical evaluations or observing the compiler translating similar code or assumptions on what "common practice" might be won't really give you an answer.

Evaluating what a compiler really has translated your UB code to would probably be your only safe bet. If you want to be really sure what happens in the corner cases, have a look at the generated (assembly or machine) code. Modern debuggers give you the toolset to catch those corner cases and tell you what actually happens (the generated machine code is, after all, very well defined). This will be much simpler and much safer than to speculate on what code the compiler might probably emit.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
tofro
  • 5,640
  • 14
  • 31
  • 1
    "only save bet."-- > "only safe bet." ? – chux - Reinstate Monica Jun 02 '21 at 19:21
  • According to the published Rationale, when the authors of the C used the phrase "Undefined Behavior", they intended that it among other things "identify areas of conforming language extension", since compilers were free to specify behaviors in cases where doing so would make sense, and there was no need to have the Standard mandate behavioral guarantees that every compiler would offer anyway absent a compelling reason to do otherwise. – supercat Jun 09 '21 at 02:26
3

The authors of the Standard seek to classify as Undefined Behavior any action whose behavior might not be easy to describe consistently in a manner consistent with the C abstraction model and with general principles of operation on the target platform, or whose behavior might observably be affected by optimization. Although nearly all processors that have a shift-by-N instruction will handle x<<y for values of y that exceed the bit size by evaluating either (x<<(y-1)<<1) or (x<<(y-bitsize)) without side effects, the phrase "without side effects" could sometimes be a little nebulous. On some processors (such as, IIRC, at least one model of Transputer), straightforwardly processing x<<y when (uint32_t)y is 0xFFFFFFFF would result in the processor taking billions of cycles to execute a single instruction, without being able to service interrupts. While the time to execute an instruction would not normally be considered a side effect, having interrupts ignored for billions of cycles could undermine system stability.

There was never any reason to invite general-purpose compilers for architectures whose shift instructions always behave in one of the two described fashions to do anything other than process the code in one of those fashions, but the Standard makes no distinction between behavioral concessions for implementations that target unusual platforms or seek to flag places where code may be incompatible with them, versus invitations for general-purpose compilers for commonplace architectures to throw behavioral precedents out the window.

If one is using an implementation that is designed to uphold longstanding behavioral precedents in the absence of a compelling reason to do otherwise, without regard for whether the Standard would require such behavior, evaluation of x<<y will behave in one of the two described fashions. If one is using an implementation that interprets the words "Undefined Behavior" as an invitation to behave nonsensically for no reason beyond the fact that the Standard doesn't forbid such behavior, however, all bets are off.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • In this case, the OP specifically wanted consistent behaviour everywhere, for a bytecode interpreter where the language being interpreted has shift operations with potentially variable shift counts. So the ability to write `x<31)? 0 : x< – Peter Cordes Jun 09 '21 at 00:06
  • IMHO, languages should define standard intrinsic functions to deal with such corner cases. If code needs a certain kind of shift, having a compiler process an intrinsic in the best way to yield such a shift will be easier and cheaper than trying to have it recognize all the ways programmers might try to express the concept without an intrinsic. – supercat Jun 09 '21 at 02:24
  • In *this* case, the interpreted language the querent is implementing is for smart contracts, so it's essential that it runs the same everywhere, and it wouldn't want to provide intrinsics. So you have to pick one behaviour that C can do cheaply across ISAs. Masking the count is either free or cheap everywhere. (Although if your use-case was biased towards mostly running on ARM, you might pick saturation (zeroing) instead of wrapping.) – Peter Cordes Jun 09 '21 at 02:52
  • Or maybe you meant languages like *C* should have an intrinsic for count-saturating shift. Yes, absolutely agreed. Rust follows that design principle of providing lots of stuff instead of needing the compiler to recognize idioms (including [`i32.wrapping_shl()`](https://doc.rust-lang.org/std/primitive.i32.html#method.wrapping_shl) but not saturating_shl :/ There is "overflowing_shl" which returns a separate boolean); ISO C unfortunately doesn't. Some dialects have more stuff, e.g. GNU C provide some add-overflow intrinsics, but IDK about shifts. – Peter Cordes Jun 09 '21 at 02:53
  • Oh, I didn't realize AArch64 truncated shift counts like x86, unlike 32-bit ARM! https://godbolt.org/z/W9GTG68bv That makes `x >> (y&63)` the obvious choice if you have to pick one to use everywhere - future servers / desktops / laptops are a lot more likely to be running AArch64 than ARM32. (I had no luck getting gcc or clang to optimize a ternary to just a shift on ARM anyway; agreed this needs an intrinsic if anyone wants to use it.) – Peter Cordes Jun 09 '21 at 03:22
  • 1
    @PeterCordes: I think the intention with C was that if a programmer who happened to need behavior consistently with a platform's shift behavior could simply use << or >>, and if a programmer needed something else, having the source mask the size or add conditional logic would be no worse than having the compiler do it. It was never intended that programmers who know what a platform's behavior is have to jump through hoops to get behavior consistent with that of the underlying platform. – supercat Jun 09 '21 at 14:36