43

I've just done a test with bitfields, and the results are surprising me.

class test1 {
public:
    bool test_a:1;
    bool test_b:1;
    bool test_c:1;
    bool test_d:1;
    bool test_e:1;
    bool test_f:1;
    bool test_g:1;
    bool test_h:1;
};

class test2 {
public:
    int test_a:1;
    int test_b:1;
    int test_c:1;
    int test_d:1;
    int test_e:1;
    int test_f:1;
    int test_g:1;
    int test_h:1;
};

class test3 {
public:
    int test_a:1;
    bool test_b:1;
    int test_c:1;
    bool test_d:1;
    int test_e:1;
    bool test_f:1;
    int test_g:1;
    bool test_h:1;
};

The results were:-

sizeof(test1) = 1   // This is what I'd expect. 8 bits in a byte
sizeof(test2) = 4   // Reasonable. Maybe padded out to the size of an int.
sizeof(test3) = 16  // What???

Is this what you'd expect, or a compiler bug? (Codegear C++ Builder 2007, btw...)

Roddy
  • 66,617
  • 42
  • 165
  • 277
  • 1
    If you want to have more control over the layout of bit field structures in memory, consider using this bit field facility, implemented as a library header file: [link](https://github.com/wkaras/C-plus-plus-library-bit-fields/blob/master/Bitfield.pdf) – WaltK Jan 04 '17 at 18:27
  • Related: This question asks if this behavior is expected or a compiler bug. I made a separate question a while back that instead asks for workarounds, here: https://stackoverflow.com/questions/24765685/packing-bools-with-bit-field-c – Keith M Jan 23 '19 at 18:41
  • Note: This behavior only occurs when mixing `bool` with integer-style types. Mixing e.g. `short` and `char` works fine. I've also only personally seen it with Visual Studio compilers. – Keith M Jan 23 '19 at 18:51

6 Answers6

33

your compiler has arranged all of the members of test3 on integer size boundaries. Once a block has been used for a given type (integer bit-field, or boolean bit-field), the compiler does not allocate any further bit fields of a different type until the next boundary.

I doubt it is a bug. It probably has something to do with the underlying architecture of your system.

edit:

c++ compilers will allocate bit-fields in memory as follows: several consecutive bit-field members of the same type will be allocated sequentially. As soon as a new type needs to be allocated, it will be aligned with the beginning of the next logical memory block. The next logical block will depend on your processor. Some processors can align to 8-bit boundaries, while others can only align to 16-bit boundaries.

In your test3, each member is of a different type than the one before it, so the memory allocation will be 8 * (the minimum logical block size on your system). In your case, the minimum block size is two bytes (16-bit), so the size of test3 is 8*2 = 16.

On a system that can allocate 8-bit blocks, I would expect the size to be 8.

e.James
  • 116,942
  • 41
  • 177
  • 214
  • 1
    But, if that's the case, why 16, instead of 20 ((4 + 1) * 4) or 32 ((4 + 4) * 4)? – C. K. Young Nov 21 '08 at 10:45
  • I'm guessing that your system cannot align to anything smaller than 16-bit boundaries. When test_a:1 is allocated, it takes up the first bit of a 16-bit field. When test_b:1 is allocated, it is of a different type, so the compiler starts it on the next 16-bit boundary, for a total of 128 bits. – e.James Nov 21 '08 at 10:58
  • I think it is even easier than that. I think they just chose to align consecutive bitfields of differing type based on the alignment of the underlying type. This forces the first of a consecutive run of int bitfields to start on 4 byte boundaries and the first of consecutive bool bitfields to start on byte boundaries. AFAICT that is a valid way of implementing bitfields while following the standard and seems like a pretty reasonable way to do it. And it makes it clear why it turned out being 16 bytes. – Tim Seguine Nov 02 '18 at 21:59
  • @TimSeguine - according to that logic I'd expect `sizeof(test3) = 20` : (4 int-aligned fields = 4 * 4 = 16) + (4 bool-aligned fields = 4 * 1 = 4). – Guss Apr 11 '19 at 07:48
  • @Guss My logic was that the first of consecutive int bits have to start on a 4 byte boundary and that the first of consecutive bool bits had to start on a 1 byte boundary. In that case the first int bitfield is in the first byte. The bool bitfield is then aligned to a byte boundary: the second byte. Then comes another int field which would need to be aligned on a 4 byte boundary again. It was all speculation of course. My real point though was that there are any number of valid ways to do it, most of which are standards conformant, and many of which would give a 16 byte struct. – Tim Seguine Apr 11 '19 at 10:56
  • So in your implementation, the bool type will be packed into the allocation unit of the previous `int` because it's allocation unit would fit inside it (after using some of it for the bit field, but I'm assuming it will start on the 8th bit and not after the only used bit 0), but the next `int`'s allocation unit "can't fit inside the reminder" so it will take a new full sized 32 bit allocation unit, and so on and so forth? I'm not saying it breaks the spec, but it sure is a weird optimization. If you're compiler is packing "values that fit", why not measure the ints and see that they fit also? – Guss Apr 11 '19 at 14:59
20

Be careful with bitfields as much of its behavior is implementation (compiler) defined:

From C++03, 9.6 Bitfields (pg. 163):

Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [Note:bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. ]

That is, it is not a bug in the compiler but rather lack of a standard definition of how it should behave.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
7

Wow, that's surprising. In GCC 4.2.4, the results are 1, 4, and 4, respectively, both in C and C++ modes. Here's the test program I used that works in both C99 and C++.

#ifndef __cplusplus
#include <stdbool.h>
#endif
#include <stdio.h>

struct test1 {
    bool test_a:1;
    bool test_b:1;
    bool test_c:1;
    bool test_d:1;
    bool test_e:1;
    bool test_f:1;
    bool test_g:1;
    bool test_h:1;
};

struct test2 {
    int test_a:1;
    int test_b:1;
    int test_c:1;
    int test_d:1;
    int test_e:1;
    int test_f:1;
    int test_g:1;
    int test_h:1;
};

struct test3 {
    int test_a:1;
    bool test_b:1;
    int test_c:1;
    bool test_d:1;
    int test_e:1;
    bool test_f:1;
    int test_g:1;
    bool test_h:1;
};

int
main()
{
    printf("%zu %zu %zu\n", sizeof (struct test1), sizeof (struct test2),
                            sizeof (struct test3));
    return 0;
}
C. K. Young
  • 219,335
  • 46
  • 382
  • 435
5

As a general observation, a signed int of 1 bit doesn't make a lot of sense. Sure, you can probably figure out how to store 0 in it, but then the trouble starts.

One bit must be the sign-bit, even in two's complement, but you only have one bit to play with. So, if you allocate that as the sign-bit, you have no bits left for the actual value. It's true as Steve Jessop points out in a comment that you could probably represent -1 if using two's complement, but I still think that an "integer" datatype that can only represent 0 and -1 is a rather weird thing.

To me, this datatypes makes no (or, given Steve's comment, little) sense.

Use unsigned int small : 1; to make it unsigned, then you can store the values 0 and 1 in a non-ambiguous manner.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • hehe. If it's one's complement, it can only store plus and minus zero... I'd used int just as an example. My 'real' code when I hit this was using uints. – Roddy Nov 21 '08 at 12:04
  • 16
    If it's a two's-complement signed 1bit value, then a clear bit represents 0 and a set bit represents -1. Where's the problem? ;-) – Steve Jessop Nov 21 '08 at 18:04
1
#include <iostream>
using namespace std;

bool ary_bool4[10];

struct MyStruct {
    bool a1 :1;
    bool a2 :1;
    bool a3 :1;
    bool a4 :1;
    char b1 :2;
    char b2 :2;
    char b3 :2;
    char b4 :6;
    char c1;
};

int main() {
    cout << "char size:\t" << sizeof(char) << endl;
    cout << "short int size:\t" << sizeof(short int) << endl;
    cout << "default int size:\t" << sizeof(int) << endl;
    cout << "long int size:\t" << sizeof(long int) << endl;
    cout << "long long int size:\t" << sizeof(long long int) << endl;
    cout << "ary_bool4 size:\t" << sizeof(ary_bool4) << endl;
    cout << "MyStruct size:\t" << sizeof(MyStruct) << endl;
    // cout << "long long long int size:\t" << sizeof(long long long int) << endl;
    return 0;
}

char size: 1
short int size: 2
default int size: 4
long int size: 4
long long int size: 8
ary_bool4 size: 10
MyStruct size: 3
t1t0
  • 11
  • 1
0

From "Samuel P. Harbison, Guy L. Steele] C A Reference":

The problem:

"Compilers are free to impose constraints on the maximum size of a bit field, and specify certain addressing boundaries that bit field cannot cross."

Manipulations which can be done within standard:

"An unnamed bit field may also be included in a structure to provide padding."

"Specify a length of 0 for unnamed bit field has a special meaning - it indicates that no more bit fields should be packed into the area in which the previous bit field...Area here means some impl. defined storage unit"

Is this what you'd expect, or a compiler bug?

So within C89, C89 with amendment I, C99 - it is not a bug. About C++ I don't know, but I think that the behavior is similar.

Ionic
  • 499
  • 4
  • 18
Konstantin Burlachenko
  • 5,233
  • 2
  • 41
  • 40