2

I found several questions discussing compiler optimizations for signed vs unsigned. The general conclusion is that signed allows for more optimizations due to the undefined behavior of overflow. However, I didn't find any discussion on how signedness plays with operation reordering.

Consider the expression

a - b + c

If all values are unsigned, the compiler can always reorder the operations (adding a and c first may improve performance).

If all values are signed, the compiler must prove that reordering will not cause an overflow to occur. In general a + c could overflow so compiler is constrained in what it can reorder.

Am I correct that the compiler has more freedom to reorder operations for unsigned values?


EDIT

I was going to ask about other operations like * and / but I guess this question only makes sense for + and -. In general no reordering is possible for muliply and divide no matter whether the integers are signed or unsigned.

martinkunev
  • 1,364
  • 18
  • 39
  • 3
    Even if `a+c` do overflow, it does not mean that the *machine* code of the operation will be different if reordered. It is UB only on the C level. So depending on the architecture it might not even matter – Eugene Sh. Feb 27 '23 at 17:04
  • Any of the signed combinations might overflow, since the signed values can be anywhere in the range. For unsigned, the order of evaluation makes no difference because the processor register itself will keep the value within defined bounds without explicit operations. – Weather Vane Feb 27 '23 at 17:12
  • 1
    @WeatherVane I originally had a comment like that, but the programmer might know a priori that the ordering from the left-associativity of the operators never overflows due to the nature of the data. – Barmar Feb 27 '23 at 17:23
  • @Barmar perhaps, but will the compiler? Should the compiler *not* change the order, on the grounds that the programmer has arranged the sequence so as not to overflow? – Weather Vane Feb 27 '23 at 17:26
  • 1
    @WeatherVane That's the point of the question. The compiler is supposed to implement the semantics of the virtual machine, which doesn't overflow in the programmer's order. Does this prevent optimizations that reorder. As Eugene says, it depends on whether it really makes a difference to the CPU architecture. In most CPUs it doesn't (there's no integer overflow exception), so the optimization is allowed. – Barmar Feb 27 '23 at 17:33
  • @EugeneSh. Good point. So it likely only matters for ones' complement and sign-and-magnitude systems. – martinkunev Feb 27 '23 at 18:23
  • "Undefined behavior" in this sense is behavior *of the C abstract machine* that is not defined *by the C language specification*. Such (un)definedness has little to do with physical-machine machine code for a particular machine that is somehow analogous, even if that machine code is produced by a C compiler. – John Bollinger Feb 27 '23 at 18:32
  • There is a quite common misconception that the transformations done by the compiler must be within the C abstract machine. It is true that the observable behavior of the generated program is dictated by the abstract machine but to achieve this behavior the compiler can use the full power of the target architecture. – Johan Feb 28 '23 at 13:05

1 Answers1

7

If all values are signed, the compiler must prove that reordering will not cause an overflow to occur.

This premise is false. If there is no undefined behavior in the C program, the compiler’s burden is to produce that behavior. The C standard does not constrain how it does it. If it reorders a - b + c to a + c - b but it does so using instructions that behave appropriately, then a + c - b will calculate the same result as a - b + c.

Consider an example using 32-bit int objects: a is 7FFFFFF016, b is 10016, and c is 2016. Then a - b would be 7FFFFEF016, and a - b + c would be 7FFFFF1016. If the compiler computes a + c using an add instruction that produces 8000001016, and then computes a + c - b using a subtract instruction that produces 7FFFFF1016, then the result is correct.

This works because the add and subtract instructions do not “care” about overflow and produces the desired results even if they go outside of int bounds. Not only are such instructions common on today’s processors, the same add and subtract instructions are used for both signed and unsigned arithmetic, because the bit patterns for addition and subtraction are the same for unsigned and two’s complement representations. (Comparison instructions differ, because the bits have different meanings when interpreted as unsigned or two’s complement.)

The fact that if the source code had been literally changed to a + c - b, the program would have had undefined behavior is irrelevant. A compiler typically does not operate by notionally rearranging the C source code to other C source code that is then compiled. Compilers typically operate by converting the C semantics to some internal representation and then generating assembly code to implement those semantics.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 2
    [Compiler optimizations may cause integer overflow. Is that okay?](https://stackoverflow.com/q/74102654) has proof that compilers do this optimization. (At least clang does; GCC actually *does* get hung up on not wanting to introduce signed overflow in the asm for reasons related to it being undefined in the source. Something to do with using the same bool flag to mean multiple things in the GCC internals IIRC.) Basically a duplicate, but coming at the same question from the other direction. – Peter Cordes Feb 28 '23 at 08:45