133

I believe I found a bug in GCC while implementing O'Neill's PCG PRNG. (Initial code on Godbolt's Compiler Explorer)

After multiplying oldstate by MULTIPLIER, (result stored in rdi), GCC doesn't add that result to INCREMENT, movabs'ing INCREMENT to rdx instead, which then gets used as the return value of rand32_ret.state

A minimum reproducible example (Compiler Explorer):

#include <stdint.h>

struct retstruct {
    uint32_t a;
    uint64_t b;
};

struct retstruct fn(uint64_t input)
{
    struct retstruct ret;

    ret.a = 0;
    ret.b = input * 11111111111 + 111111111111;

    return ret;
}

Generated assembly (GCC 9.2, x86_64, -O3):

fn:
  movabs rdx, 11111111111     # multiplier constant (doesn't fit in imm32)
  xor eax, eax                # ret.a = 0
  imul rdi, rdx
  movabs rdx, 111111111111    # add constant; one more 1 than multiplier
     # missing   add rdx, rdi   # ret.b=... that we get with clang or older gcc
  ret
# returns RDX:RAX = constant 111111111111 : 0
# independent of input RDI, and not using the imul result it just computed

Interestingly, modifying the struct to have the uint64_t as the first member produces correct code, as does changing both members to be uint64_t

x86-64 System V does return structs smaller than 16 bytes in RDX:RAX, when they're trivially copyable. In this case the 2nd member is in RDX because the high half of RAX is the padding for alignment or .b when .a is a narrower type. (sizeof(retstruct) is 16 either way; we're not using __attribute__((packed)) so it respects alignof(uint64_t) = 8.)

Does this code contain any undefined behaviour that would allow GCC to emit the "incorrect" assembly?

If not, this should get reported on https://gcc.gnu.org/bugzilla/

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
vitorhnn
  • 1,043
  • 1
  • 8
  • 7
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/206554/discussion-on-question-by-vitorhnn-possible-gcc-bug-when-returning-struct-from-a). – Samuel Liew Jan 24 '20 at 05:06

3 Answers3

104

I don't see any UB here; your types are unsigned so signed-overflow UB is impossible, and there's nothing weird. (And even if signed, it would have to produce correct outputs for inputs that don't cause overflow UB, like rdi=1). It's broken with GCC's C++ front-end as well.

Also, GCC8.2 compiles it correctly for AArch64 and RISC-V (to an madd instruction after using movk to construct constants, or RISC-V mul and add after loading the constants). If it was UB that GCC was finding, we'd generally expect it to find it and break your code for other ISAs as well, at least ones that have similar type widths and register widths.

Clang also compiles it correctly.

This appears to be a regression from GCC 5 to 6; GCC5.4 compiles is correctly, 6.1 and later don't. (Godbolt).

You can report this on GCC's bugzilla using the MCVE from your question.

It really looks like it's a bug in x86-64 System V struct-return handling, perhaps of structs containing padding. That would explain why it works when inlining, and when widening a to uint64_t (avoiding padding).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
22

This has been fixed on trunk/master.

Here is the relevant commit.

And this is a patch to fix the issue.

Based on a comment in the patch, the reload_combine_recognize_pattern function was trying to adjust USE insns.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
14

Does this code contain any undefined behaviour that would allow GCC to emit the "incorrect" assembly?

The behavior of the code presented in the question is well defined with respect to the C99 and later C language standards. In particular, C permits functions to return structure values without restriction.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 2
    GCC does produce a stand-alone definition of the function; that's what we're looking at, regardless of whether that's what runs when you compile it in a translation unit along with other functions. You could just as easily test it without actually using `__attribute__((noinline))` by compiling it in a translation unit by itself and linking without LTO, or by compiling with `-fPIC` which implies all global symbols are (by default) interposable so can't be inlined into callers. But really the problem is detectable just from looking at the generated asm, regardless of callers. – Peter Cordes Jan 23 '20 at 05:03
  • Fair enough, @PeterCordes, though I'm reasonably confident that that detail was changed out from under me in Godbolt. – John Bollinger Jan 23 '20 at 05:15
  • Version 1 of the question linked to Godbolt with just the function by itself in a translation unit, like the state of the question itself when you answered. I didn't check all the revisions or comments you could have been looking at. Water under the bridge, but I don't think there was ever a claim that the stand-alone asm definition was only broken when the source used `__attribute__((noinline))`. (That would be shocking, not just surprising the way a GCC correctness bug is). Probably that got mentioned only for making a test caller that prints the result. – Peter Cordes Jan 23 '20 at 05:27