5

Recently while porting our application from gcc-5.3 to 8.2, we noticed a strange behavior that breaks our application.

In short, it seems gcc-8.2 removed one of our "if branch which compares 2 unsigned integers" without even producing a warning.

We tried g++ 5.3, g++ 7.4 and g++ 8.2 with the same compile options and only g++ 8.2 has this problem. Will show a short example below.

#include <iostream>
#include <cstdint>
#include <cstdlib>
#include <cstring>

using namespace std;

struct myunion {
    myunion(uint32_t x) {
        _data.u32 = x;
    }
    uint16_t hi() const { return _data.u16[1]; }
    uint16_t lo() const { return _data.u16[0]; }
    union {
        uint16_t u16[2];
        uint32_t u32;
    } _data;
};

 __attribute__((noinline)) void printx1x2(uint32_t x1, uint32_t x2) {
    cout << "x1: " << x1 << endl;
    cout << "x2: " << x2 << endl;
}

__attribute__((noinline)) int func(uint32_t a, uint32_t b) {
    const uint32_t x1 = myunion(a).hi() * myunion(b).lo();
    const uint32_t x2 = x1 + myunion(a).lo() * myunion(b).hi();
    printx1x2(x1, x2);
    int ret = 0;
    if ( x2 < x1 ) {
        ret = 0x10000;
    }
    return ret;
}

int main(int argc, char** argv) {
    cout << func(4294967295, 4294917296) << endl;
    return 0;
}

The above code is compiled as below:

$ g++-7 --version
g++-7 (GCC) 7.4.1 20181207
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++-7 -Wall -std=c++14 -O3 a.cxx -o 7.out
$ ./7.out
x1: 1018151760
x2: 1018020689
65536

$ g++ --version
g++ (GCC) 8.2.1 20181127
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++ -Wall -std=c++14 -O3 a.cxx -o 8.out
$ ./8.out
x1: 1018151760
x2: 1018020689
0

I'm expecting the output of 7.out to be correct.

Is this actually something UB ( undefined behavior ) or it can be a g++ bug?

UPDATE

Looks like removing the union access UB still processes unwanted results:

#include <iostream>
#include <cstdint>
#include <cstdlib>
#include <cstring>

using namespace std;

struct myunion2 {
    myunion2(uint32_t x) {
        _data = x;
    }
    uint16_t hi() const { return (uint16_t)((_data & 0xFFFF0000) >> 16); }
    uint16_t lo() const { return (uint16_t)((_data & 0xFFFF)); }
    uint32_t _data;
};

 __attribute__((noinline)) void printx1x2(uint32_t x1, uint32_t x2) {
    cout << "x1: " << x1 << endl;
    cout << "x2: " << x2 << endl;
}

__attribute__((noinline)) int func(uint32_t a, uint32_t b) {
    const uint32_t x1 = myunion2(a).hi() * myunion2(b).lo();
    const uint32_t x2 = x1 + myunion2(a).lo() * myunion2(b).hi();
    printx1x2(x1, x2);
    int ret = 0;
    if ( x2 < x1 ) {
        ret = 0x10000;
    }
    return ret;
}

int main(int argc, char** argv) {
    cout << func(4294967295, 4294917296) << endl;
    return 0;
}

Output:

$ g++-7 -Wall -std=c++14 -O3 a.cxx -o 7.out
[2019-03-27 22:48:30][wliu@wliu-arch-vm1 ~/tests]
$ ./7.out
x1: 1018151760
x2: 1018020689
65536
[2019-03-27 22:48:32][wliu@wliu-arch-vm1 ~/tests]
$ g++ -Wall -std=c++14 -O3 a.cxx -o 8.out
[2019-03-27 22:49:11][wliu@wliu-arch-vm1 ~/tests]
$ ./8.out
x1: 1018151760
x2: 1018020689
0
Wei LIU
  • 53
  • 4

1 Answers1

7

The problem (besides the union-punning in original example) is this expression:

myunion2(a).lo() * myunion2(b).hi();

The values of the operands are 65535 * 65535. The types of the operands are uint16_t.

Arithmetic operations are not performed on types smaller than int. Smaller types are promoted first. Since uint16_t is smaller than int, and the range of values representable by uint16_t can be represented by int, those operands are promoted to int. But the operation 65535 * 65535 overflows int, which is a signed type. And signed overflow has undefined behaviour.

Solution: Convert to larger unsigned before multiply (or return larger unsigned in the first place):

const uint32_t x1 = (unsigned)myunion2(a).hi() * myunion2(b).lo();
const uint32_t x2 = x1 + (unsigned)myunion2(a).lo() * myunion2(b).hi();
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • I think hi/low should just return a `uint32_t` which would fix this in (possibly? imo?) a more elegant way. – Mike Vine Mar 27 '19 at 15:19
  • @MikeVine Technically, `uint32_t` would still result in promotion to `int` on systems with 64 bit int. But for these calculations, a 64 bit `int` would not overflow, so maybe it would be OK. I did mention that returning the larger type is an option. Whether it is more elegant is subjective. The user of the function might not necessarily always need the larger type for their purposes. – eerorika Mar 27 '19 at 15:21
  • You saved my life! Thanks for the highly compact and informative explanation! – Wei LIU Mar 27 '19 at 15:21