20
#include <stdint.h>
#include <iostream>

using namespace std;

uint32_t k[] = {0, 1, 17};

template <typename T>
bool f(T *data, int i) {
    return data[0] < (T)(1 << k[i]);
}

int main() {
    uint8_t v = 0;
    cout << f(&v, 2) << endl;
    cout << (0 < (uint8_t)(1 << 17)) << endl;
    return 0;
}


g++ a.cpp && ./a.out
1
0

Why am I getting these results?

David G
  • 94,763
  • 41
  • 167
  • 253
hello.co
  • 746
  • 3
  • 21

4 Answers4

20

It looks like gcc reverses the shift and applies it to the other side, and I guess this is a bug.

In C (instead of C++) the same thing happens, and C translated to asm is easier to read, so I'm using C here; also I reduced the test cases (dropping templates and the k array). foo() is the original buggy f() function, foo1() is what foo() behaves like with gcc but shouldn't, and bar() shows what foo() should look like apart from the pointer read.

I'm on 64-bit, but 32-bit is the same apart from the parameter handling and finding k.

#include <stdint.h>
#include <stdio.h>

uint32_t k = 17;
char foo(uint8_t *data) {
    return *data < (uint8_t)(1<<k);
/*
with gcc -O3 -S: (gcc version 4.7.2 (Debian 4.7.2-5))
    movzbl  (%rdi), %eax
    movl    k(%rip), %ecx
    shrb    %cl, %al
    testb   %al, %al
    sete    %al
    ret
*/
}
char foo1(uint8_t *data) {
    return (((uint32_t)*data) >> k) < 1;
/*
    movzbl  (%rdi), %eax
    movl    k(%rip), %ecx
    shrl    %cl, %eax
    testl   %eax, %eax
    sete    %al
    ret
*/
}
char bar(uint8_t data) {
    return data < (uint8_t)(1<<k);
/*
    movl    k(%rip), %ecx
    movl    $1, %eax
    sall    %cl, %eax
    cmpb    %al, %dil
    setb    %al
    ret
*/
}

int main() {
    uint8_t v = 0;
    printf("All should be 0: %i %i %i\n", foo(&v), foo1(&v), bar(v));
    return 0;
}
Stefan
  • 5,304
  • 2
  • 25
  • 44
8

If your int is 16-bit long, you're running into undefined behavior and either result is "OK".

Shifting N-bit integers by N or more bit positions left or right results in undefined behavior.

Since this happens with 32-bit ints, this is a bug in the compiler.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
4

Here are some more data points:

basically, it looks like gcc optimizes (even in when the -O flag is off and -g is on):

    [variable] < (type-cast)(1 << [variable2])

to

    ((type-cast)[variable] >> [variable2]) == 0

and

    [variable] >= (type-cast)(1 << [variable2])

to

    ((type-cast)[variable] >> [variable2]) != 0

where [variable] needs to be an array access.

I guess the advantage here is that it doesn't have to load the literal 1 into a register, which saves 1 register.

So here are the data points:

  • changing 1 to a number > 1 forces it to implement the correct version.
  • changing any of the variables to a literal forces it to implement the correct version
  • changing [variable] to a non array access forces it to implement the correct version
  • [variable] > (type-cast)(1 << [variable2]) implements the correct version.

I suspect this is all trying to save a register. When [variable] is an array access, it needs to also keep an index. Someone probably thought this is so clever, until it's wrong.

Using code from the bug report http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56051

    #include <stdio.h>

    int main(void)
    {
        int a, s = 8;
        unsigned char data[1] = {0};

        a = data[0] < (unsigned char) (1 << s);
        printf("%d\n", a);

        return 0;
    }

compiled with gcc -O2 -S

     .globl main
            .type   main, @function
    main:
    leal    4(%esp), %ecx
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ecx
    subl    $8, %esp
    pushl   $1                ***** seems it already precomputed the result to be 1
    pushl   $.LC0
    pushl   $1
    call    __printf_chk
    xorl    %eax, %eax
    movl    -4(%ebp), %ecx
    leave
    leal    -4(%ecx), %esp
    ret

compile with just gcc -S

    .globl main
            .type   main, @function
    main:
    leal    4(%esp), %ecx
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ebx
    pushl   %ecx
    subl    $16, %esp
    movl    $8, -12(%ebp)
    movb    $0, -17(%ebp)
    movb    -17(%ebp), %dl
    movl    -12(%ebp), %eax
    movb    %dl, %bl
    movb    %al, %cl
    shrb    %cl, %bl                      ****** (unsigned char)data[0] >> s => %bl
    movb    %bl, %al                              %bl => %al
    testb   %al, %al                              %al = 0?
    sete    %dl
    movl    $0, %eax
    movb    %dl, %al
    movl    %eax, -16(%ebp)
    movl    $.LC0, %eax
    subl    $8, %esp
    pushl   -16(%ebp)
    pushl   %eax
    call    printf
    addl    $16, %esp
    movl    $0, %eax
    leal    -8(%ebp), %esp
    addl    $0, %esp
    popl    %ecx
    popl    %ebx
    popl    %ebp
    leal    -4(%ecx), %esp
    ret

I guess the next step is to dig through gcc's source code.

thang
  • 3,466
  • 1
  • 19
  • 31
  • In case you missed it, the PR gives you the exact line in gcc source code where this optimization happens... It happens for < and >= if the lhs is unsigned and the rhs is (a cast of) 1 << var. Because of the time at which the optimization occurs, it happens only if the syntax tree is exactly this (no going through variables or whatever). – Marc Glisse Jan 20 '13 at 14:29
  • @MarcGlisse, It is so specific that I can't help but wonder if someone wise ass thought this is a clever optimization to save a register when it is needed most. Otherwise, why would it not happen for non-array access? what's PR? I am looking in the bug report, but not seeing it. I tried replacing data[0] with a local variable d, and that actually made it work. Also tried changing variables to literals, and that works, too. – thang Jan 20 '13 at 14:37
  • It does happen for non-arrays, and please refrain from such rude comments. – Marc Glisse Jan 20 '13 at 14:41
  • this piece of code works correctly (d doesn't get optimized out): #include int main(void) { int a, s =8; int d = 0; a = d < (unsigned char)(1 << s); printf("%d\n", a); return 0; } – thang Jan 20 '13 at 14:43
  • also changing int d = 0; to unsigned char d = 0; also works. – thang Jan 20 '13 at 14:51
  • ah i see the source code line. overlooked it. – thang Jan 20 '13 at 14:57
-1

I'm pretty sure we're talking undefined behaviour here - converting a "large" integer to a smaller, of a value that doesn't fit in the size of the new value, is undefined as far as I know. 131072 definitely doesn't fit in a uint_8.

Although looking at the code generated, I'd say that it's probably not quite right, since it does "sete" rather than "setb"??? That does seem very suspicios to me.

If I turn the expression around:

return (T)(1<<k[i])  >  data[0];

then it uses a "seta" instruction, which is what I'd expect. I'll do a bit more digging - but something seems a bit wrong.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 4
    According to http://stackoverflow.com/questions/6752567/casting-a-large-number-type-to-a-smaller-type casting to an unsigned smaller type is well defined. (and should do what we expect) – Stefan Jan 20 '13 at 12:53
  • I notice your code says "sete" as well. I haven't delved very deeply into it, but it doesn't seem right to me. – Mats Petersson Jan 20 '13 at 12:58
  • 1
    The generated code is the equivalent of `test eax, eax; setz al`, which looks like a reasonable way of returning `x == 0`, which for `unsigned int` is identical to `x < 1`. Seems okay to me. – DCoder Jan 20 '13 at 13:10
  • Yes, but x < 1 is not the same as x < (1 << k) - and I looked at the UNOPTIMIZED code, so it would not do any constant replacement. That doesn't seem right. – Mats Petersson Jan 20 '13 at 13:27
  • x < (1 << k) is "the same" as (x >> k) < 1 – Stefan Jan 20 '13 at 13:33
  • I'm not sure what code transformations GCC does at which optimization settings, but it seems like it just changed `x < (1 << k)` to `(x >> k) < 1` and missed the truncating `(T)` cast. Interestingly, extracting `1 << k` to a temporary variable fixed the problem for me (MinGW GCC 4.6.2). – DCoder Jan 20 '13 at 13:35
  • Gcc sometimes optimizes things during parsing (even at -O0). It shouldn't, but that's legacy and there will be a lot of work moving all those optimizations to a later stage. – Marc Glisse Jan 20 '13 at 14:05