Why/how does gcc compile the undefined behaviour in this signed-overflow test so it works on x86 but not ARM64?

Question

I was self-studying CSAPP and got a strange result when I ran into a strange issue during the run of a assertion test.

I'm not sure what to start this question with, so let me get the code first (file name visible in comments):

// File: 2.30.c
// Author: iBug

int tadd_ok(int x, int y) {
    if ((x ^ y) >> 31)
        return 1;  // A positive number and a negative integer always add without problem
    if (x < 0)
        return (x + y) < y;
    if (x > 0)
        return (x + y) > y;
    // x == 0
    return 1;
}

// File: 2.30-test.c
// Author: iBug

#include <assert.h>

int tadd_ok(int x, int y);

int main() {
    assert(sizeof(int) == 4);

    assert(tadd_ok(0x7FFFFFFF, 0x80000000) == 1);
    assert(tadd_ok(0x7FFFFFFF, 0x7FFFFFFF) == 0);
    assert(tadd_ok(0x80000000, 0x80000000) == 0);
    return 0;
}

And commands:

gcc -o test -O0 -g3 -Wall -std=c11 2.30.c 2.30-test.c
./test

(Side note: There wasn't any -O option present in the command line, but as it defaults to level 0, explicitly adding -O0 shouldn't change much.)

The above two commands ran very well on my Ubuntu VM (amd64, GCC 7.3.0), but one of the assertions failed on my Android phone (AArch64 or armv8-a, GCC 8.2.0).

2.30-test.c:13: main: assertion "tadd_ok(0x7FFFFFFF, 0x7FFFFFFF) == 0" failed

Note that the first assertion passed, so int is guaranteed to be 4 bytes on the platforms.

So I fired up gdb on my phone trying to get some insights:

(gdb) l 2.30.c:1
1       // File: 2.30.c
2       // Author: iBug
3
4       int tadd_ok(int x, int y) {
5           if ((x ^ y) >> 31)
6               return 1;  // A positive number and a negative integer always add without problem
7           if (x < 0)
8               return (x + y) < y;
9           if (x > 0)
10              return (x + y) > y;
(gdb) b 2.30.c:10
Breakpoint 1 at 0x728: file 2.30.c, line 10.
(gdb) r
Starting program: /data/data/com.termux/files/home/CSAPP-2019/ch2/test
warning: Unable to determine the number of hardware watchpoints available.
warning: Unable to determine the number of hardware breakpoints available.

Breakpoint 1, tadd_ok (x=2147483647, y=2147483647)
    at 2.30.c:10
10              return (x + y) > y;
(gdb) p x
$1 = 2147483647
(gdb) p y
$2 = 2147483647
(gdb) p (x + y) > y
$3 = 0
(gdb) c
Continuing.
2.30-test.c:13: main: assertion "tadd_ok(0x7FFFFFFF, 0x7FFFFFFF) == 0" failed

Program received signal SIGABRT, Aborted.
0x0000007fb7ca5928 in abort ()
   from /system/lib64/libc.so
(gdb) d 1
(gdb) p tadd_ok(0x7FFFFFFF, 0x7FFFFFFF)
$4 = 1
(gdb)

As you see in the GDB output, the result is very inconsistent, as the return statement on 2.30.c:10 was reached, and the return value should have been 0, but the function still returns 1, making the assertion fail.

Kindly provide an idea what I'm getting wrong here.

Please respect what I have presented. Just saying it's UB without relating the platforms, especially GDB output, is not going to be any helpful.

Check the generated assembly. signed-integers overflowing is undefined behavior, the compiler might have optimized the statement `(x + y) > y` to just return `x>0`. — tkausl, Feb 04 '19 at 04:25
@tkausl The `-g3` option should pretty much disable such optimization. — iBug, Feb 04 '19 at 04:31
@iBug: `-g3` has zero effect on code-gen. But you omitted any `-O` options, so the default `-O0` (no optimization, consistent debugging) is in effect. ([Why does clang produce inefficient asm for this simple floating point sum (with -O0)?](//stackoverflow.com/q/53366394)). Your comments on the answers claiming that `-g3` will have some kind of effect are nonsense, you should probably delete them and replace with the point that you used `-O0` (you did, right?) `-g3` just means that debug info will contain as much info as possible; the gcc manual mentions macro definitions as an example. — Peter Cordes, Feb 04 '19 at 05:09
You have GDB running on your target platform, why not show the compiler-generated asm you're stepping through, and register values? Is it the same as what AArch64 gcc8.2 -O0 makes on Godbolt? https://godbolt.org/z/dpvjCH I also included the `-O3` output where AArch64 gcc optimizes the function to `mov w0, 1`. And BTW, I replaced `>>31` with `< 0` just for fun, because it's a signed `int`. An arithmetic right shift was fine, though, producing `0` or `-1`. Even gcc -O0 emits the same asm for `< 0` or `>> 31` in a boolean context, though. — Peter Cordes, Feb 04 '19 at 05:20
@PeterCordes Thanks, indeed I later added `-O0` expecting no difference (I know it's the default). — iBug, Feb 04 '19 at 05:23
Updated my answer: you must have been using gcc7.x or older on x86. gcc8 introduced the optimization that breaks your code. — Peter Cordes, Feb 04 '19 at 06:25
@Peter gcc8 doesn't break this code. It is inherently broken — n. m. could be an AI, Feb 04 '19 at 06:30
@n.m. yes, I was just thinking about rephrasing that in my answer, or at least putting it in quotes. — Peter Cordes, Feb 04 '19 at 06:36
gcc is not obliged to share your ideas about signed overflow. — n. m. could be an AI, Feb 04 '19 at 06:37
Ok, but I thought your problem was understanding what's going on with the asm gcc is creating, not actually using this obsolete UB-dependent code that's no longer usable with all modern compilers. — Peter Cordes, Feb 04 '19 at 06:42
@PeterCordes Nope, I spent some time investigating the assembly output to learn why, but I also employed `-fwrapv` to get rid of the issue. — iBug, Feb 04 '19 at 06:43
@PeterCordes Anyways, sorry for missing GCC version. I thought it didn't matter. — iBug, Feb 04 '19 at 06:45
@iBug: I added an update to my answer that shows how you can cast to `unsigned` to get access to the raw binary addition operation, then cast back to `int` for a compare. This works on 2's complement machines where `int` and `unsigned` are the same size, etc. etc. so the cast is just a reinterpret of the same bits. Re: gcc version: I'm surprised it mattered, too. People have been warning about the challenges in avoiding UB while checking for signed addition overflow in C for many years, so I had assumed more compilers had been optimizing aggressively in practice like for loop bounds. — Peter Cordes, Feb 04 '19 at 07:05
"Why is this code with UB gets compiled this way and not that way" is simply not a legitimate question to ask, unless the compiler in question *defines and documents* the behaviour in question. — n. m. could be an AI, Feb 04 '19 at 07:28
@n.m.: If that statement were true, it would be impossible for compiler developers to develop compilers, as they would be unable to investigate why their development code, which is not yet documented, behaves in a certain way. In fact, a particular compiler is a concrete thing, and we can investigate and ask why it behaves a particular way (the same way scientists investigate the natural world even though it came without any documentation). We cannot rely on that behavior in future versions, but we certainly can ask about it. — Eric Postpischil, Feb 04 '19 at 12:14

Peter Cordes · Accepted Answer · 2019-02-04T22:22:41.950

Signed overflow is Undefined Behaviour in ISO C. You can't reliably cause it and then check if it happened.

In the expression (x + y) > y;, the compiler is allowed to assume that x+y doesn't overflow (because that would be UB). Therefore, it optimizes down to checking x > 0. (Yes, really, gcc does this even at -O0).

This optimization is new in gcc8. It's the same on x86 and AArch64; you must have used different GCC versions on AArch64 and x86. (Even at -O3, gcc7.x and earlier (intentionally?) miss this optimization. clang7.0 doesn't do it either. They actually do a 32-bit add and compare. They also miss optimizing tadd_ok to return 1, or to add and checking the overflow flag (V on ARM, OF on x86). Clang's optimized asm is an interesting mix of >>31, OR and one XOR operation, but -fwrapv actually changes that asm so it's probably not doing a full overflow check.)

You could say that gcc8 "breaks" your code, but really it was already broken as far as being legal / portable ISO C. gcc8 just revealed that fact.

To see it more clearly, lets isolate just that expression into one function. gcc -O0 compiles each statement separately anyway, so the information that this only runs when x<0 doesn't affect the -O0 code-gen for this statement in your tadd_ok function.

// compiles to add and checking the carry flag, or equivalent
int unsigned_overflow_test(unsigned x, unsigned y) {
    return (x+y) >= y;    // unsigned overflow is well-defined as wrapping.
}

// doesn't work because of UB.
int signed_overflow_expression(int x, int y) {
    return (x+y) > y;
}

On the Godbolt compiler explorer with AArch64 GCC8.2 -O0 -fverbose-asm:

signed_overflow_expression:
    sub     sp, sp, #16       //,,      // make a stack fram
    str     w0, [sp, 12]      // x, x   // spill the args
    str     w1, [sp, 8]       // y, y
   // end of prologue

   // instructions that implement return (x+y) > y; as return  x > 0
    ldr     w0, [sp, 12]      // tmp94, x
    cmp     w0, 0     // tmp94,
    cset    w0, gt  // tmp95,                  // w0 = (x>0) ? 1 : 0
    and     w0, w0, 255       // _1, tmp93     // redundant

  // epilogue
    add     sp, sp, 16        //,,
    ret

GCC `-ftree-dump-original` or `-optimized` will even turn its GIMPLE back into C-like code with this optimization done (from the Godbolt link):

;; Function signed_overflow_expression (null)
;; enabled by -tree-original

{
  return x > 0;
}

Unfortunately, even with -Wall -Wextra -Wpedantic, there's no warning about a the comparison. It's not trivially true; it still depends on x.

The optimized asm is unsurprisingly cmp w0, 0 / cset w0, gt / ret. The AND with 0xff is redundant. cset is an alias of csinc, using the zero-register as both sources. So it will produce 0 / 1. With other registers, the general case of csinc is a conditional select and increment of any 2 registers.

Anyway, cset is AArch64's equivalent of x86 setcc, for turning a flag condition into a bool in a register.

If you want your code to work as written, you'd need to compile with -fwrapv to make it well-defined behaviour in the variant of C that -fwrapv makes GCC implement. The default is -fstrict-overflow, like the ISO C standard.

If you want to check for signed overflow in modern C, you need to write checks that detect overflow without actually causing it. This is harder, annoying, and a point of contention between compiler writers and (some) developers. They argue that the language rules around undefined behaviour weren't meant to be used as an excuse to "gratuitously break" code when compiling for target machines where it would make sense in asm. But modern compilers mostly only implement ISO C (with some extensions and extra defined behaviour), even when compiling for target architectures like x86 and ARM where signed integers have no padding (and thus wrap just fine), and don't trap on overflow.

So you could say "shots fired" in that war, with the change in gcc8.x to actually "breaking" unsafe code like this. :P

See Detecting signed overflow in C/C++ and How to check for signed integer overflow in C without undefined behaviour?

Since signed and unsigned addition are the same binary operation in 2's complement, you could maybe just cast to unsigned for the add, and cast back for a signed compare. That would make a version of your function that's safe on "normal" implementations: 2's complement, and casting between unsigned and int is just a reinterpret of the same bits.

This can't have UB, it just won't give the right answer on one's complement or sign/magnitude C implementations.

return  (int)((unsigned)x + (unsigned)y) > y;

This compiles (with gcc8.2 -O3 for AArch64) to

    add     w0, w0, w1            // x+y
    cmp     w0, w1                // x+y  cmp  y
    cset    w0, gt
    ret

If you had written int sum = x+y as a separate C statement from return sum < y, this UB wouldn't be visible to gcc with optimization disabled. But as part of the same expression, even gcc with the default -O0 can see it.

Compile-time-visible UB is all kinds of bad. In this case, only certain ranges of inputs would produce UB, so the compiler assumes it doesn't happen. If unconditional UB is seen on a path of execution, an optimizing compiler can assume that path never happens. (In a function with no branching, it could assume the function is never called, and compile it to a single illegal instruction.) See Does the C++ standard allow for an uninitialized bool to crash a program? for more about compile-time-visible UB.

(-O0 doesn't mean "no optimization", it means no extra optimization besides what's already necessary to transform through gcc's internal representations on the way to asm for whatever target platform. @Basile Starynkevitch explains in Disable all optimization options in GCC)

Some other compilers may "turn their brains off" even more with optimization disabled, and do something closer to transliterating C into asm, but gcc is not like that. For example, gcc still uses a multiplicative inverse for integer division by a constant at -O0. (Why does GCC use multiplication by a strange number in implementing integer division?) All 3 other major x86 compilers (clang/ICC/MSVC) use div.

This one pretty much covers everything I would write in my own answer, so it definitely worth the green tick. Detailed and comprehensive! — iBug, Feb 04 '19 at 06:47
With `-fdump-tree-original` (or optimized) you see `return x > 0;`, I think that's easier than showing asm. — Marc Glisse, Feb 04 '19 at 18:45
@MarcGlisse: Thanks, I hadn't realized any of the tree-dump options made output that was that easy to read. Godbolt has can add a tree-dump pane, so I edited the answer to add that. — Peter Cordes, Feb 04 '19 at 22:24

dbush · Answer 2 · 2019-02-04T05:06:56.470

5

Overflow of signed integers invokes undefined behavior. You can't check for an overflow condition by adding two numbers and checking if they wrap around in some way. While you might get away with this on an x86/x64 system, there's no guarantee others will behave the same.

What you can do however is some arithmetic along with INT_MAX or INT_MIN to do the check.

int tadd_ok(int x, int y) {
    if ((x ^ y) >> 31)
        return 1;  // A positive number and a negative integer always add without problem
    if (x < 0)
        return INT_MIN - x < y;
    if (x > 0)
        return INT_MAX - x > y;
    // x == 0
    return 1;
}

The expression INT_MAX - x > y is arithmetically equivalent to INT_MAX > x + y but prevents an overflow from occurring. Similarly, INT_MIN - x < y is arithmetically equivalent to INT_MIN < x + y but prevents overflow.

EDIT:

If you want to force signed integer overflow to be defined, you can use the -fwrapv option to gcc. However, you're better off avoiding overflow altogether.

edited Feb 04 '19 at 05:06

answered Feb 04 '19 at 04:31

dbush

205,898
23
218
273

1

Please respect my compile command: `gcc -g3`, and platform specification (AArch64, Android). – iBug Feb 04 '19 at 04:33
2

@iBug How you're compiling doesn't change the fact that your code has undefined behavior. Avoid going that and you should get consistent behavior. – dbush Feb 04 '19 at 04:35
1

Your answer has completely diverted from my question. Can you check the GDB output, *at least*? – iBug Feb 04 '19 at 04:41
3

@iBug Inconsistent output points to undefined behavior. Get rid of the undefined behavior and your get rid of the inconsistency. – dbush Feb 04 '19 at 04:42
4

GDBs output doesn't matter. GDB evaluates the statement itself. Again, _check the generated assembly_. – tkausl Feb 04 '19 at 04:44
Good job on the last paragraph, `-fwrapv` did the trick. I'll be afk for some time, will come back to check the assembly later. – iBug Feb 04 '19 at 05:33

score 2 · Answer 3 · answered Feb 04 '19 at 23:55

2

I'd like to add that there is a simple way in GCC to deal with signed addition with overflow and make it defined. You can use the builtins documented at https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html to perform signed operations (add, sub, mul) that are defined to wrap around and it will tell you whether or not the operation overflowed.

bool __builtin_add_overflow(type1 a, type2 b, type3 *res)

You can for example rewrite your function like this:

int tadd_ok(int x, int y) {
    int result;
    return !__builtin_add_overflow(x, y, &result);
    // result now contains (int)((unsigned int)x + (unsigned int)y)
}

answered Feb 04 '19 at 23:55

Emil

16,784
2
41
52

1

Cool! It appears this builtin was new in gcc5.x and clang3.8. On x86 https://godbolt.org/z/3h79BF , gcc and clang both use to `add` / `setno` (i.e. does an add and then looks at the hardware signed-overflow flag.) On AArch64 gcc8.2 does an ADD and then a bunch of xor / and :( – Peter Cordes Feb 05 '19 at 01:19
1

Since GCC 9, it uses `adds` / `cset` on AArch64. – Emil Aug 27 '23 at 08:04

Govind Parmar · Answer 4 · 2019-02-04T04:33:41.373

1

Signed integer overflow is undefined behaviour according to the C standard, unlike unsigned overflow which is guaranteed to wrap around.

Try putting your code on Godbolt with latest GCC x86-64 and -O3. It gets optimized to:

mov eax, 1
ret

Which is acceptable. I imagine an equivalent instruction sequence is emitted for ARM64, but I don’t know that architecture and can’t be sure just by looking.

edited Feb 04 '19 at 04:33

answered Feb 04 '19 at 04:32

Govind Parmar

20,656
7
53
85

1

Please respect my compile command: `gcc -g3`, and platform specification (AArch64, Android). – iBug Feb 04 '19 at 04:33
1

@iBug undefined is undefined, regardless of platform or compiler switch. All bets are off once you invoke UB. – Govind Parmar Feb 04 '19 at 04:36
1

But the compile command in the question didn't use optimization, so the function isn't inlined. The UB thus isn't visible to the compiler. – Peter Cordes Feb 04 '19 at 05:11

score 1 · Answer 5 · answered Feb 04 '19 at 05:20

1

As you were already told, you're invoking undefined behavior. Overflow is not defined for signed integers in C. The compiler understands, that the second and third if statements are undefined in terms of signed integers, hence the compiler decides, that whatever branch would be taken, cannot happen in a well defined program. Thus the whole function tadd_ok collapses into a single return 1.

It doesn't matter if you disable optimization: That these if statements invoke undefined behavior is determined long before the optimizer gets to work.

And it also doesn't matter that you enable generation of debug information, because that's not changing the way code it generated (it just adds annotations for tools that interpret binary dumps and process state).

Last but not least, when you make GDB print the result of the statement (x+y)>y it does this outside the scope of C compilation, but in terms of "running on the metal" instructions. After C isn't the only language that's compiled to binary. And while signed integer underflow is undefined in C, it may be perfectly well defined in some different language; and you might want to be able to use GDB on such programs as well. When comparing the output of p (x+y)>y with the C statement (x+y)>y with x and y being signed int, you're comparing oranges with apples; they're very different things.

answered Feb 04 '19 at 05:20

datenwolf

159,371
13
185
298

I don't think GDB's eval of `(x+y)>y` JIT-compiles for the target to anything like "running on the metal". I think it just interprets internally, so it ends up actually doing the add (with truncation to 32-bit `int`, thus wrapping), not making the optimization to `x > 0`. If you use an expression that includes function calls, then GDB will run the call in the target process, using the existing function definitions (machine code) in the executable/library. But otherwise it's just interpreted. – Peter Cordes Feb 04 '19 at 07:12
But anyway, the "metal" that does the `add` is the GDB host, not the target being debugged. (This would matter more in a remote-debug scenario where they're not even the same architecture). Hopefully GDB itself is written in a way that avoids UB inside its interpreter when evaluating expressions, e.g. maybe using `uint32_t`. – Peter Cordes Feb 04 '19 at 07:13
@PeterCordes: The important thing to keep in mind is, that GDB (or any debugger for that matter) can not make any assumptions regarding the semantics of whatever language the program was compiled from into binary form. It doesn't know if the binary was created from C (where signed integer overflow is undefined) or some language X with well defined signed integer overflow semantics. All it sees are symbols with a certain underlying machine datatype, and when asked to perform evaluate some expression it will do it according to ISA semantics. Same for calling functions, which is a matter of ABI. – datenwolf Feb 05 '19 at 05:46
Oh, *that's* what you meant by "running on the metal"? Yeah, agreed then. And good point about not assuming C semantics. I assumed it didn't, and that's an excellent reason why it can't, even though GDB's command language is C-like. – Peter Cordes Feb 05 '19 at 05:50

score 1 · Answer 6 · 2019-02-04T06:14:57.263

I know you asked for something else other than UB, but I am afraid it is what causes the issue in your case, even though you are using -O0. Let's look at generated assembly.

I have simplified your function to this to isolate UB:

int tadd_ok(int x, int y) {
    if (x > 0)
        return (x + y) > y;

    return 1;
}

The output generated for AArch64 (-O0 -x c -march=armv8-a):

tadd_ok:
        sub     sp, sp, #16
        str     w0, [sp, 12]
        str     w1, [sp, 8]
        ldr     w0, [sp, 12]
        cmp     w0, 0
        ble     .L2           ; if (x <= 0) goto return stmt
        ldr     w0, [sp, 12]  ; here we are runnig (x + y) > y branch
        cmp     w0, 0         ; x is compared to zero
        cset    w0, gt        ; return value is set to (x > 0)
        and     w0, w0, 255
        b       .L3
.L2:
        mov     w0, 1
.L3:
        add     sp, sp, 16
        ret

Keep in mind that since signed integers are not allowed to overflow, the expression (x + y) is always greater than y unless x <= 0. GCC knows this before optimizer kicks in, so it replaces (x + y) > y with x > 0.

Even though it just made the same check, it seems to forget about this - side effect of having no optimizations enabled.

You can replace the C code above with this:

int tadd_ok(int x, int y) {
    if (x > 0)
        return x > 0;

    return 1;
}

And the output does not change:

tadd_ok:
        sub     sp, sp, #16
        str     w0, [sp, 12]
        str     w1, [sp, 8]
        ldr     w0, [sp, 12]
        cmp     w0, 0
        ble     .L2
        ldr     w0, [sp, 12]
        cmp     w0, 0
        cset    w0, gt
        and     w0, w0, 255
        b       .L3
.L2:
        mov     w0, 1
.L3:
        add     sp, sp, 16
        ret

With the code above, it is clear what optimizer will do to it:

tadd_ok:
        mov     w0, 1
        ret

Other options you use do not change anything, platform does not matter since there is no addition instructions generated.

As for GDB: it runs complex expressions by executing them in a debuggee process using the same code that was generated by the compiler, so the output will be no different. Hence evaluating tadd_ok(0x7FFFFFFF, 0x7FFFFFFF) runs the same code.

Your last paragraph is not true for expressions that don't involve function calls. `(x + y) > y` compiles to asm like `x > 0` by AArch64 gcc `-O0`, but the OP's GDB output shows that GDB evaluated the expression to `0`. Otherwise, yeah, you found the same thing I did in my answer posted at about the same time >. — Peter Cordes, Feb 04 '19 at 06:06
@PeterCordes I said "complex" expressions. Simple ones are evaluated by debugger for performance reasons. I was talking about OPs attempt to run `tadd_ok(0x7FFFFFFF, 0x7FFFFFFF)` — , Feb 04 '19 at 06:07
Ah right, I missed that they tried that from within GDB. Might want to edit your answer to say "like `tadd_ok(0x7FFFFFFF, 0x7FFFFFFF)`" — Peter Cordes, Feb 04 '19 at 06:08

Why/how does gcc compile the undefined behaviour in this signed-overflow test so it works on x86 but not ARM64?

6 Answers6

GCC `-ftree-dump-original` or `-optimized` will even turn its GIMPLE back into C-like code with this optimization done (from the Godbolt link):

Linked

Why/how does gcc compile the undefined behaviour in this signed-overflow test so it works on x86 but not ARM64?

6 Answers6

GCC -ftree-dump-original or -optimized will even turn its GIMPLE back into C-like code with this optimization done (from the Godbolt link):

Linked

GCC `-ftree-dump-original` or `-optimized` will even turn its GIMPLE back into C-like code with this optimization done (from the Godbolt link):