Optimization levels in gcc changing c program behaviour

Question

I'm seeing behaviour I don't expect when compiling this code with different optimization levels in gcc.

The function test should fill a 64 bit unsigned integer with ones, shift them shift_size bits to the left, and return the 32 low bits as a 32 bit unsigned integer.

When I compile with -O0 I get the results I expect.

When I compile with -O2 I do not, if I try to shift by 32 bits or more.

In fact, I get exactly the results I'd expect if I were shifting a 32 bit integer by shifts greater than or equal to bit width on x86, which is a shift using only the 5 low bits of the shift size.

But I'm shifting a 64 bit number, so shifts < 64 should be legal right?

I assume it's a bug in my understanding and not in the compiler, but I haven't been able to figure it out.

My machine: gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5 i686-linux-gnu

#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>

uint32_t test(unsigned int shift_size) {
    uint64_t res = 0;
    res = ~res;
    res = res << shift_size; //Shift size < uint64_t width so this should work
    return res; //Implicit cast to uint32_t
}

int main(int argc, char *argv[])
{
    int dst;
    sscanf(argv[1], "%d", &dst); //Get arg from outside so optimizer doesn't eat everything
    printf("%" PRIu32 "l\n", test(dst));
    return 0;
}

Usage:

$ gcc -Wall -O0 test.c 
$ ./a.out 32
0l
$ gcc -Wall -O2 test.c 
$ ./a.out 32
4294967295l

gcc -S -Wall -O0 test.c

gcc -S -Wall -O2 test.c

Could you also post the relevant parts of the assembly code generated by your compiler? (`-S` for gcc) — Greg Hewgill, Jan 11 '12 at 09:50
FWIW it gives the correct result (`0l`) for me on all optimisation levels from `-O0` to `-O3` using gcc 4.2.1, so I suspect you *might* have found a gcc bug. — Paul R, Jan 11 '12 at 09:52
hm, code works fine in both optimization levels for me... (gcc version 4.5.0 20100604, openSUSE 11.3 (x86_64)) — Bort, Jan 11 '12 at 09:54
OK - I take that back - if I compile with `-m32` instead of the default `-m64` then I can reproduce the problem with gcc 4.2.1 - it looks like something 32-bit-specific then. — Paul R, Jan 11 '12 at 09:54
I am running GCC 4.2.1 (i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)) and cannot reproduce what you're seeing, compiling with `-O0` through `-O3` levels, with either `-m32` or `-m64` flags. — Alex Reynolds, Jan 11 '12 at 09:56
@Alex: if you try the *non-LLVM* gcc 4.2.1 with -O2 -m32 then you should see it (that's what I'm using) — Paul R, Jan 11 '12 at 10:02
You're right. `$ gcc-4.2 -Wall -O2 -m32 test4.c; ./a.out 32` returns `4294967295l`. — Alex Reynolds, Jan 11 '12 at 10:05
Thanks for the feedback, printf issue pointed out by pmg is fixed and assembly generated by gcc on my system is linked now — user1142769, Jan 11 '12 at 10:28
Confirmed with 4.6.1, please report a bug on GCC tracker: http://gcc.gnu.org/bugzilla/ — , Jan 11 '12 at 11:33
It's a confirmed bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51821 — nos, Jan 11 '12 at 11:34

pmg · Answer 1 · 2012-01-11T10:23:14.350

5

"%u" (or "%lu") and uint32_t are not necessarily compatible. Try

    #include <inttypes.h>

    //printf("%ul\n", test(dst));
    printf("%" PRIu32 "l\n", test(dst));

Printing a uint32_t value with a "%u" (or "%lu") specifier can invoke Undefined Behaviour.

edited Jan 11 '12 at 10:23

answered Jan 11 '12 at 10:09

pmg

106,608
13
126
198

Good call, uint32_t may not be unsigned int. – dreamlax Jan 11 '12 at 10:18
@dreamlax: I think the point is that the OP used "%ul" by mistake instead of "%lu", which ends up being taken by printf as "%u". Using PRIu32 removes the possibility of using an incorrect specifier. – tinman Jan 11 '12 at 10:21

Greg Hewgill · Answer 2 · 2012-01-11T10:46:11.540

I was able to repro this. Here's the relevant bit of generated code with -O2:

    movl    $-1, %eax
    movl    $-1, %edx
    sall    %cl, %eax
    xorl    %edx, %edx
    testb   $32, %cl
    cmovne  %eax, %edx
    cmovne  %edx, %eax    ; This appears to be the instruction in error.
                          ; It looks as though gcc thought that %edx might still
                          ; be zero at this point, because if the shift count is
                          ; >= 32 then %eax should be zero after this.

and here is the equivalent bit with -O0:

    movl    -16(%ebp), %eax
    movl    -12(%ebp), %edx
    shldl   %cl,%eax, %edx
    sall    %cl, %eax
    testb   $32, %cl
    je      L3
    movl    %eax, %edx
    xorl    %eax, %eax         ; correctly zeros %eax if shift count >= 32
L3:
    movl    %eax, -16(%ebp)
    movl    %edx, -12(%ebp)

Compiler is:

i686-apple-darwin11-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)

Thanks for posting your gcc -S output. I had a look and while it's slightly different, the critical part has the same error as what I saw on my machine.

score 3 · Answer 3 · answered Jan 11 '12 at 09:59

3

It looks like it might be a 32-bit-specific compiler bug to me. With the code from the question and gcc 4.2.1 I can reproduce the bug so long as I compile with gcc -m32 -O2 ....

However, if I add a debug printf:

uint32_t test(unsigned int shift_size) {
    uint64_t res = 0;
    res = ~res;
    res = res << shift_size; //Shift size < uint64_t width so this should work
    printf("res = %llx\n", res);
    return res; //Implicit cast to uint32_t
}

then the problem goes away.

The next step would be to look at the generated code to try and identify/confirm the bug.

answered Jan 11 '12 at 09:59

Paul R

208,748
37
389
560

As a temporary workaround, perhaps one could print to `/dev/null`: `FILE *null; null = fopen("/dev/null","w"); fprintf(null,"res = %llx\n", res); fclose(null);` – Alex Reynolds Jan 11 '12 at 10:10
1

As a temporary workaround, making `res` `volatile` works just as well and requires no file operations. – Jan 11 '12 at 11:34

score 1 · Answer 4 · answered Jan 11 '12 at 09:54

1

It looks like a bug. My guess is that the compiler has folded the last two lines from:

res = res << shift_size
return (uint32_t)res;

into:

return ((uint32_t)res) << shift_size;

The latter is now well-defined for 32 or larger.

answered Jan 11 '12 at 09:54

Lindydancer

25,428
4
49
68

score -2 · Answer 5 · answered Jan 11 '12 at 10:16

-2

I don't remember what C99 say, but it seems that in gcc, uint32_t contain at least 32 bits, and may contain more, so when you optimize it, it use 64bit var, which is faster in 64-bit machine.

answered Jan 11 '12 at 10:16

asaelr

5,438
1
16
22

2

No, uint32_t , if available, is exactly 32 bits. (uint_fast32_t or uint_least32_t might be larger, but not uint32_t) – nos Jan 11 '12 at 10:30
uint64_t is not in all cases faster than uint32_t: it is faster for indexing in arrays iff you have a 64bit address space on intel/amd64 machines, because 16bit and 32bit indexes have to be converted while 8bit and 64bit are legal offsets in that assembler language. but multiplication and division are slower iff you use larger bit sizes. – comonad Jan 11 '12 at 14:12

Optimization levels in gcc changing c program behaviour

5 Answers5