Is this a compiler optimisation bug, or an undefined behaviour?

Question

We have an annoying bug I can't explain around this piece of code:

unsigned char bitmap[K_BITMAP_SIZE] = {0} ;
SetBit(bitmap, K_18); // Sets the bit #18 to 1

for(size_t i = 0; i < K_END; ++i)
{
    if(TestBit(bitmap, i)) // true for 18
    {
        size_t i2 = getData(i); // for 18, will return 15
        SetBit(bitmap, i2); // BUG: IS SUPPOSED TO set the bit #15 to 1
    }
}

It happens on Visual C++ 2010
It happens both on 32-bit and 64-bit builds
It happens only on Release builds (with "Maximize Speed (/O2)" set
It does not happen only on Release builds with "Minimize Size (/O1)" set
It happens on Visual C++ 2008 only if we __forceinline the getData function (by default, VC++2008 does not inline that function, while VC++2010 does)
It happens on the piece of code given below, probably because massive inlining inside the loop
It doesn't happen if we remove the loop, and directly set the interesting value (18)

Bonus info:

1- BenJ commented the issue does not appear on Visual C++ 2012, meaning this could well be a bug in the compiler

2- If we add a cast to unsigned char in the Test/Set/ResetBit functions, the bug disappears, too

size_t TestBit(const unsigned char * bits, size_t pos) { return (((bits)[(pos) >> 3]) &   (1 << (unsigned char)((pos) & 7))) ; }
size_t SetBit(unsigned char * bits, size_t pos)        { return (((bits)[(pos) >> 3]) |=  (1 << (unsigned char)((pos) & 7))) ; }
size_t ResetBit(unsigned char * bits, size_t pos)      { return (((bits)[(pos) >> 3]) &= ~(1 << (unsigned char)((pos) & 7))) ; }

The question is:

Does this bug happens because our code relies on undefined behaviour, or is there some bug in the VC++2010 compiler?

The following source is self-sufficient, and can be compiled as such on your favorite compiler:

#include <iostream>


const size_t K_UNKNOWN              = (-1) ;
const size_t K_START                = (0) ;
const size_t K_12                   = (K_START + 12) ;
const size_t K_13                   = (K_START + 13) ;
const size_t K_15                   = (K_START + 15) ;
const size_t K_18                   = (K_START + 18) ;
const size_t K_26                   = (K_START + 26) ;
const size_t K_27                   = (K_START + 27) ;
const size_t K_107                  = (K_START + 107) ;
const size_t K_128                  = (K_START + 128) ;
const size_t K_END                  = (K_START + 208) ;
const size_t K_BITMAP_SIZE          = ((K_END/8) + 1) ;


size_t TestBit(const unsigned char * bits, size_t pos) { return (((bits)[(pos) >> 3]) &   (1 << ((pos) & 7))) ; }
size_t SetBit(unsigned char * bits, size_t pos)        { return (((bits)[(pos) >> 3]) |=  (1 << ((pos) & 7))) ; }
size_t ResetBit(unsigned char * bits, size_t pos)      { return (((bits)[(pos) >> 3]) &= ~(1 << ((pos) & 7))) ; }


size_t getData(size_t p_value)
{
    size_t value = K_UNKNOWN;

    switch(p_value)
    {
        case K_13:      value = K_12;        break;
        case K_18:      value = K_15;        break;
        case K_107:     value = K_15;        break;
        case K_27:      value = K_26;        break;
        case K_128:     value = K_12;        break;
        default:        value = p_value;     break;
    }

    return value;
}


void testBug(const unsigned char * p_bitmap)
{
    const size_t byte = p_bitmap[1] ;
    const size_t bit  = 1 << 7 ;
    const size_t value = byte & bit ;

    if(value == 0)
    {
        std::cout << "ERROR : The bit 15 should NOT be 0" << std::endl ;
    }
    else
    {
        std::cout << "Ok : The bit 15 is 1" << std::endl ;
    }
}


int main(int argc, char * argv[])
{
    unsigned char bitmap[K_BITMAP_SIZE] = {0} ;
    SetBit(bitmap, K_18);

    for(size_t i = 0; i < K_END; ++i)
    {
        if(TestBit(bitmap, i))
        {
            size_t i2 = getData(i);
            SetBit(bitmap, i2);
        }
    }

    testBug(bitmap) ;

    return 0;
}

Some background info: Initially:

the Test/Set/ResetBit functions were macros.
the constants were defines
the indices were either long or int (on Windows 32-bit, they have the same size)

If needed, I'll add a few more info (e.g. the generated assembler for both configurations, update on how g++ handle the problem), as soon as possible.

As a point of interest, your code appears to run fine on VS2012 in release mode on 32/64 bit with the optimisation settings you specify. — Benj, Oct 08 '12 at 09:12
@BenJ : Thanks for your input. Have you tried to qualify the `getData` function with `__forceinline`? In VC++2008, I had to do that to force the compiler to aggressively inline and then optimize the resulting code, which triggered the bug. — paercebal, Oct 08 '12 at 09:23
Yes I have, `__forceinline` doesn't appear to produce the issue on 2012 like it does on 2008. — Benj, Oct 08 '12 at 09:28
@BenJ : Excellent, thank you! I'll update the answer with that info. — paercebal, Oct 08 '12 at 09:31

Hans Passant · Accepted Answer · 2012-10-14T07:40:46.660

30

This is a code optimizer bug. It inlines both getData() and SetBit(). The combination appears to be fatal, it loses track of the value of 1 << ((pos) & 7) and always produces zero.

This bug does not occur on VS2012. A workaround is to force one of the functions to not get inlined. Given the code, you probably want to do that for getData():

__declspec(noinline)
size_t getData(size_t p_value)
{ 
    // etc..
}

edited Oct 14 '12 at 07:40

answered Oct 08 '12 at 10:53

Hans Passant

922,412
146
1,693
2,536

3

"This bug is fixed in VS2012" -- out of interest, do you know that because of a bug report you can't/don't link to, that's fixed in VS2012? Or do you know it empirically because (as reported by the questioner) the code works on that? – Steve Jessop Oct 08 '12 at 12:05
1

All old feedback at Connect was deleted. I guess I'll have to rephrase. – Hans Passant Oct 08 '12 at 12:16
5

"All old feedback at Connect was deleted" - that was nice of them. So now people are still using 2010, but no longer have access to information about known issues in it? – Steve Jessop Oct 08 '12 at 12:19
1

Heh I was wondering why about 90% of the links I go to in google that link to Connect give a nice "this content was removed" page – Earlz Oct 08 '12 at 15:42
1

@Steve Jessop : I went to "Microsoft Connect" to report the issue, and not only I was unable to find where to report that issue for VS2010 (only the form for VS2012 was available), but a previous bug I had entered was deleted, too... :-( – paercebal Oct 09 '12 at 16:00

rubber boots · Answer 2 · 2012-10-08T13:27:01.150

Addendum 2 The smallest possible part of the OP's code is given below. This snippet leads to the said optimizer bug in VS2010 - dependend on the contents of inline-expanded GetData(). Even if one combines the two returns in GetData() into one the bug is "gone". Also, it does not lead to a bug if you combine bits in only the first byte (like char bitmap[1]; - you need two bytes).

The problem does not occur under VS2012. This feels horrible because MS fixed that obviously in 2012 but not in 2010. WTF?

BTW:

g++ 4.6.2 x64 (-O3) -- ok
icpc 12.1.0 x64 (-O3) -- ok

VS2010 optimizer bug verification:

#include <iostream>
const size_t B_5=5, B_9=9;

size_t GetBit(unsigned char * b, size_t p) { return b[p>>3]  & (1 << (p & 7)); }
void   SetBit(unsigned char * b, size_t p) {        b[p>>3] |= (1 << (p & 7)); }

size_t GetData(size_t p) {
   if (p == B_5) return B_9;
   return 0;
}
/* SetBit-invocation will fail (write 0) 
   if inline-expanded in the vicinity of the GetData function, VS2010 */

 int main(int argc, char * argv[])
{
 unsigned char bitmap[2] = { 0, 0 };
 SetBit(bitmap, B_5);

 for(size_t i=0; i<2*8; ++i) {
    if( GetBit(bitmap, i) )         // no difference if temporary variable used,
        SetBit(bitmap, GetData(i)); // the optimizer will drop it anyway
 }

 const size_t byte=bitmap[1], bit=1<<1, value=byte & bit;
 std::cout << (value == 0 ? "ERROR: The bit 9 should NOT be 0" 
                          : "Ok: The bit 9 is 1") << std::endl;
 return 0;
}

After some inspection one can see that the initialization/zeroing part is not part of this specific problem.

Looked again after the meal. Seems to be a char/int propagation error. Can be cured by changing the mask functions (as has been already found out by the OP) to:

size_t TestBit  (const unsigned char * bits, size_t pos) { 
 return (bits)[pos >> 3] &   (1 << ( char(pos) & 7) ) ; 
}
size_t SetBit   (unsigned char * bits, size_t pos)       { 
 return (bits)[pos >> 3] |=  (1 << ( char(pos) & 7) ) ; 
}
size_t ResetBit (unsigned char * bits, size_t pos)       { 
 return (bits)[pos >> 3] &= ~(1 << ( char(pos) & 7) ) ; 
}

by casting the int-sized position pos to a char-size. This will lead the optimizer in VS2010 to do the right thing. Maybe somebody can comment.

If that's true in general it's a pretty heinous compiler bug. But I wonder why does force-inlining `getData` affect the initialization of `bitmap`? — Steve Jessop, Oct 08 '12 at 09:45
@MatthieuM: 8.5.1/7: "If there are fewer initializers in the list than there are members in the aggregate, then each member not explicitly initialized shall be value-initialized (8.5)". 8.5 says, "To value-initialize an object of type T means ... [in case of `unsigned char`] the object is zero-initialized" and also says "To zero-initialize an object of type T means: — if T is a scalar type (3.9), the object is set to the value of 0 (zero) converted to T;". So unless there's UB elsewhere in the code, `bitmap` *must* be zeroed to conform. — Steve Jessop, Oct 08 '12 at 09:53
No. The ` = {0}` notation will initialize the whole array to zero. See http://stackoverflow.com/a/201116/14089 . I'll look at the standard draft to find the clause, but I'm quite sure this is not the problem. EDIT: Thanks to Steve Jessop for his simultaneous confirmation... :-) — paercebal, Oct 08 '12 at 09:56
Note that I am currently confirming rubber boots' observation on my compiler. I'm suspecting the compiler is too clever, and knows I'm not using the whole bitmap, so it ignores the unused parts. — paercebal, Oct 08 '12 at 10:01
Yeah, looking at the assembler, the compiler is zero-ing selected parts of the bitmap... I guess the testBug function is the one setting the range of the bitmap being used... Anyway, in the original code, the zeroing is done with a memset, so this is not the source of the problem. — paercebal, Oct 08 '12 at 10:06
Can I convince you to drop `memset` and use `std::fill` instead? — Konrad Rudolph, Oct 08 '12 at 10:54

Is this a compiler optimisation bug, or an undefined behaviour?

2 Answers2