3

For some reason that i cant quite figure out my union of just structs containing bit fields is setting up twice as many bytes as is are necessary for any single struct.

#include <stdio.h>
#include <stdlib.h>

union instructionSet {
    struct Brane{
        unsigned int opcode: 4;
        unsigned int address: 12;
    } brane;
    struct Cmp{
        unsigned int opcode: 4;
        unsigned int blank: 1;
        unsigned int rsvd: 3;
        unsigned char letter: 8;
    } cmp;
    struct {
        unsigned int rsvd: 16;
    } reserved;
};

int main() {

    union instructionSet IR;// = (union instructionSet*)calloc(1, 2);

    printf("size of union %ld\n", sizeof(union instructionSet));
    printf("size of reserved %ld\n", sizeof(IR.reserved));
    printf("size of brane %ld\n", sizeof(IR.brane));
    printf("size of brane %ld\n", sizeof(IR.cmp));


    return 0;
}

All of the calls to sizeof return 4 however to my knowledge they should be returning 2.

timrau
  • 22,578
  • 4
  • 51
  • 64
M. Church
  • 39
  • 5
  • Cannot reproduce: https://ideone.com/NMK3hx (All have 4 byte size). – mch Mar 05 '19 at 15:02
  • 2
    `sizeof` returns `size_t` which [must be printed using `%zu`](https://stackoverflow.com/q/940087/995714) – phuclv Mar 05 '19 at 15:12
  • 1
    The data type you pack the bit field into determines the size/alignment of the containing entity. So if an `int` is 4 bytes, then `int x:1` will be packed into an aligned 4-byte integer. More than one bit field may be packed into it, but the containing structure size will be a multiple of 4. – Tom Karzes Mar 05 '19 at 15:16
  • @mch: The question states they observe 4 for all cases. Your comment states you observe 4 for all cases. That is reproducing, not not reproducing. – Eric Postpischil Mar 05 '19 at 15:28
  • @TomKarzes: Per C 2018 6.7.2.1 11, quoted in my answer, the C implementation may choose any storage unit that fits; it is not required to select in based on the type. – Eric Postpischil Mar 05 '19 at 15:29
  • @EricPostpischil Interesting. It must have changed at some point. It *used* to work like that, and probably still does in most implementations, but I guess more flexibility is allowed now. – Tom Karzes Mar 05 '19 at 15:33
  • unsigned char bit-fields isn't standard C, so there's no telling what this code will do. In addition, bit-fields are very poorly specified and should be avoided in general. – Lundin Mar 05 '19 at 15:34
  • @Lundin: An implementation may define other types to be allowed. If `unsigned char` is not allowed, the code violates a constraint in 6.7.2.1 5 and therefore would be required to produce a diagnostic message. So, if there is no diagnostic message, the implementation has defined it to be allowed, and we can tell what this code will do. – Eric Postpischil Mar 05 '19 at 15:38
  • IT is supposed to be 4 bytes or less, i thought it would be 2 am i mistaken? – M. Church Mar 05 '19 at 15:42
  • @EricPostpischil It's listed among common language extensions in annex J. Anyway, that doesn't matter since there is no telling what it will do regardless, with no specific compiler and system in mind. Nothing is specified: endianess, padding bits, padding bytes, alignment of storage unit, MSB location etc etc. All we know is that this creates some manner of mostly useless binary blob in memory. – Lundin Mar 05 '19 at 15:44
  • okay i will have to find a better way to implement what i need to do as the bit field aspect is crucial, thank you for your help sorry for the confusion. – M. Church Mar 05 '19 at 15:58
  • I would like to close this now since its obvious with the answers that i need a better way to implement this – M. Church Mar 05 '19 at 15:59
  • @EricPostpischil the question changed. The last sentence was missing and I read the first sentence as OP gets the doubled output for the union than for the structs. – mch Mar 06 '19 at 09:10

4 Answers4

2

C 2018 6.7.2.1 11 allows the C implementation to choose the size of the container is uses for bit-fields:

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined.…

The implementation you are using apparently chooses to use four-byte units. Likely that is also the size of an int in the implementation, suggesting that it is a convenient size for the implementation.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
2

There are a couple of problems here, first of all, your bitfield Brane is using unsigned int which is 4 byte.

Even if you just use half of the bits, you still use a full 32-bit width unsigned int.

Second, your Cmp bitfields uses two different field types, so you use 8-bit of the 32-bit unsigned int for your first 3 fields, and then you use a unsigned char for it's full 8-bit. Because of data alignement rules, this structure would be at least 6 bytes, but potentially more.

If you wanted to optimize the size of your union to only take 16-bit. Your first need to use unsigned short and then you need to always use the same field type to keep everything in the same space.

Something like this would fully optimize your union:

union instructionSet {
    struct Brane{
        unsigned short opcode: 4;
        unsigned short address: 12;
    } brane;
    struct Cmp{
        unsigned short opcode: 4;
        unsigned short blank: 1;
        unsigned short rsvd: 3;
        unsigned short letter: 8;
    } cmp;
    struct {
        unsigned short rsvd: 16;
    } reserved;
};

This would give you a size of 2 all around.

smsisko
  • 54
  • 2
  • The C standard allows an implementation to use any storage unit that a bit-field fits in; it is not required to use a four-byte `int` for an `int x : 3`, nor is it required to use a two-byte `short` for a `short x : 16`. Further, an implementation may pack consecutive bit-fields; `short x : 1; int y : 8; int z : 23;` may be packed into 32 bits (presuming the implementation accepts `short` for a bit-field, which it is not required to). Although you say `Cmp` uses at least 6 bytes, we know it is just 4 in OP’s implementation. – Eric Postpischil Mar 05 '19 at 15:47
  • "then you use a unsigned char for it's full 8-bit" It is not specified by the standard how large an unsigned char bitfield will be, or if it will differ from int bitfields. Same applies to short, it doesn't solve anything. "Because of data alignement rules, this structure would be at least 6 bytes" This isn't true, because the it isn't specified what will happen if an int bit-field spills over into a char one. The key here is to realize none of this is standardized or guaranteed. You can't even know which bit that's the MSB in this. You can't know where the padding is located. Etc. Etc. – Lundin Mar 05 '19 at 15:49
1

Read about memory structure padding / memory alignment. By default 32bit processor read from memory by 32bit (4bytes) because is faster. So in memory char + uint32 will be write on 4 + 4 = 8 bytes (1byte - char, 3bytes space, 4bytes uint32).

Add those lines on begin and end of your program and will be result 2.

#pragma pack(1)

#pragma unpack

This is way to say to the compiler: align memory to 1 byte (by default 4 on 32bit processor).

PS: try this example with different #pragma pack set:

struct s1 
{
    char a;
    char b;
    int c;
};

struct s2
{    
    char b;
    int c;
    char a;
};

int main() {
    printf("size of s1 %ld\n", sizeof(struct s1));
    printf("size of s2 %ld\n", sizeof(struct s2));

    return 0;
}
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Igor Galczak
  • 142
  • 6
  • The pragma stuff has worked for me, thank you so much i will play arround with the other implementations – M. Church Mar 05 '19 at 16:05
1

It isn't specified what this code will do and it isn't meaningful to reason about it without a specific system and compiler in mind. Bit-fields are simply too poorly specified in the standard to be reliably used for things like memory layouts.

union instructionSet {

    /* any number of padding bits may be inserted here */ 

    /* we don't know if what will follow is MSB or LSB */

    struct Brane{
        unsigned int opcode: 4; 
        unsigned int address: 12;
    } brane;
    struct Cmp{
        unsigned int opcode: 4;
        unsigned int blank: 1;
        unsigned int rsvd: 3;
        /* anything can happen here, "letter" can merge with the previous 
           storage unit or get placed in a new storage unit */
        unsigned char letter: 8; // unsigned char does not need to be supported
    } cmp;
    struct {
        unsigned int rsvd: 16;
    } reserved;

    /* any number of padding bits may be inserted here */ 
};

The standard lets the compiler pick a "storage unit" for any bit-field type, which can be of any size. The standard simply states:

An implementation may allocate any addressable storage unit large enough to hold a bitfield.

Things we can't know:

  • How large the bitfields of type unsigned int are. 32 bits might make sense but no guarantee.
  • If unsigned char is allowed for bit-fields.
  • How large the bitfields of type unsigned char are. Could be any size from 8 to 32.
  • What will happen if the compiler picked a smaller storage unit than the expected 32 bits, and the bits doesn't fit inside it.
  • What happens if an unsigned int bit-field meets an unsigned char bit-field.
  • If there will be padding in the end of the union or in the beginning (alignment).
  • How individual storage units within the structs are aligned.
  • The location of the MSB.

Things we can know:

  • We have created some sort of binary blob in memory.
  • The first byte of the blob resides on the least significant address in memory. It may contain data or padding.

Further knowledge can be obtained by having a very specific system and compiler in mind.


Instead of the bit-fields we can use 100% portable and deterministic bitwise operations, that yield the same machine code anyway.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • I may end up doing this as there are many fields that overlap and i could have a function to arbitrarily mask as needed. – M. Church Mar 05 '19 at 16:30