9

I have this C struct: (representing an IP datagram)

struct ip_dgram
{
    unsigned int ver   : 4;
    unsigned int hlen  : 4;
    unsigned int stype : 8;
    unsigned int tlen  : 16;
    unsigned int fid   : 16;
    unsigned int flags : 3;
    unsigned int foff  : 13;
    unsigned int ttl   : 8;
    unsigned int pcol  : 8;
    unsigned int chksm : 16;
    unsigned int src   : 32;
    unsigned int des   : 32;
    unsigned char opt[40];
};

I'm assigning values to it, and then printing its memory layout in 16-bit words like this:

//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
    unsigned short int* ptr = (unsigned short int*)&dgram;
    int i,j;
    //print only 10 words
    for(i=0 ; i<10 ; i++)
    {
        for(j=15 ; j>=0 ; j--)
        {
            if( (*ptr) & (1<<j) ) printf("1");
            else printf("0");
            if(j%8==0)printf(" ");
        }
        ptr++;
        printf("\n");
    }
}

int main()
{
    struct ip_dgram dgram;

    dgram.ver   = 4;
    dgram.hlen  = 5;
    dgram.stype = 0;
    dgram.tlen  = 28;
    dgram.fid   = 1;
    dgram.flags = 0;
    dgram.foff  = 0;
    dgram.ttl   = 4;
    dgram.pcol  = 17;
    dgram.chksm = 0;
    dgram.src   = (unsigned int)htonl(inet_addr("10.12.14.5"));
    dgram.des   = (unsigned int)htonl(inet_addr("12.6.7.9"));

    print_dgram(dgram);

    return 0;
}

I get this output:

00000000 01010100 
00000000 00011100 
00000000 00000001 
00000000 00000000 
00010001 00000100 
00000000 00000000 
00001110 00000101 
00001010 00001100 
00000111 00001001 
00001100 00000110

But I expect this:

enter image description here

The output is partially correct; somewhere, the bytes and nibbles seem to be interchanged. Is there some endianness issue here? Are bit-fields not good for this purpose? I really don't know. Any help? Thanks in advance!

Bruce
  • 945
  • 3
  • 12
  • 24
  • possible duplicate of [Why bit endianness is an issue in bitfields?](http://stackoverflow.com/questions/6043483/why-bit-endianness-is-an-issue-in-bitfields) – Lundin Feb 28 '13 at 13:30
  • Take a look at how e.g. the Linux or BSD networking stack does this. – vonbrand Feb 28 '13 at 20:05

7 Answers7

12

No, bitfields are not good for this purpose. The layout is compiler-dependant.

It's generally not a good idea to use bitfields for data where you want to control the resulting layout, unless you have (compiler-specific) means, such as #pragmas, to do so.

The best way is probably to implement this without bitfields, i.e. by doing the needed bitwise operations yourself. This is annoying, but way easier than somehow digging up a way to fix this. Also, it's platform-independent.

Define the header as just an array of 16-bit words, and then you can compute the checksum easily enough.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • So what can I do to control the layout? I need them contiguous, since I have to calculate their checksum, which is the 1's complement sum of all such 16-bit words in the datagram. If I make all fields as 32-bit unsigned integers, I'm unable to see how I'll find their sum.. – Bruce Feb 28 '13 at 13:18
  • 1
    `__attribute__((packed))` on gcc and empty zero field `:0` to force alignment to the next word. – Patrick Schlüter Feb 28 '13 at 13:31
  • @tristopia Can you explain more clearly? Or make it an answer? – Bruce Feb 28 '13 at 13:35
  • @unwind Okay thanks, I don't know why that didn't come up in my head. I'll do that only then. – Bruce Feb 28 '13 at 13:38
  • 1
    Packed does nothing here and is harmful in general. Don't recommend it. – R.. GitHub STOP HELPING ICE Feb 28 '13 at 14:17
5

The C11 standard says:

An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined.

I'm pretty sure this is undesirable, as it means there might be padding between your fields, and that you can't control the order of your fields. Not just that, but you're at the whim of the implementation in terms of network byte order. Additionally, imagine if an unsigned int is only 16 bits, and you're asking to fit a 32-bit bitfield into it:

The expression that specifies the width of a bit-field shall be an integer constant expression with a nonnegative value that does not exceed the width of an object of the type that would be specified were the colon and expression omitted.

I suggest using an array of unsigned chars instead of a struct. This way you're guaranteed control over padding and network byte order. Start off with the size in bits that you want your structure to be, in total. I'll assume you're declaring this in a constant such as IP_PACKET_BITCOUNT: typedef unsigned char ip_packet[(IP_PACKET_BITCOUNT / CHAR_BIT) + (IP_PACKET_BITCOUNT % CHAR_BIT > 0)];

Write a function, void set_bits(ip_packet p, size_t bitfield_offset, size_t bitfield_width, unsigned char *value) { ... } which allows you to set the bits starting at p[bitfield_offset / CHAR_BIT] bit bitfield_offset % CHARBIT to the bits found in value, up to bitfield_width bits in length. This will be the most complicated part of your task.

Then you could define identifiers for VER_OFFSET 0 and VER_WIDTH 4, HLEN_OFFSET 4 and HLEN_WIDTH 4, etc to make modification of the array seem less painless.

autistic
  • 1
  • 3
  • 35
  • 80
  • 2
    I'm somewhat puzzled as to why bitfields are specified as they are in the standard. If there were a portable means of explicitly laying out bitfields, that could be very useful; since there isn't, I'm not sure what is gained by e.g. specifying the rules in a way that would effectively forbid a 32-bit compiler from allowing e.g. two twelve-bit numbers to fit into a three-byte structure. I can understand not wanting to *require* compilers to perform such fine packing, but my understanding is the compiler would not be allowed to have fields overlap both byte boundaries. – supercat Mar 11 '13 at 15:36
  • @supercat Yes, bitfields are a rather useless and boring feature, unlike the magical `UInt4` pointers in [Kat's answer](http://stackoverflow.com/a/15139924/1989425) which seem to be able to point to *half bytes*. It'd be nice to be able to declare arrays of them, eg. an array of `sizeof (int) * CHAR_BIT` bitfields of size 1, to manipulate `bitfield[0]` through to `bitfield[sizeof (int) * CHAR_BIT - 1]`. – autistic Mar 12 '13 at 00:03
  • Can you think of anything practical which a practical existing or prospective well-defined program can do with bit-fields defined the way they are which a well-defined program would not be able to do if implementations were free to arrange or pad bitfields in any convenient fashion subject only to the constraint that a bitfield should be no larger than a structure with the same types, and that a zero-bit "item" or a non-bitfield item will force alignment appropriate for the declared type? – supercat Mar 12 '13 at 14:45
  • @supercat As far as I'm concerned, bitfields have no practical use. I imagine that ought to answer your question. Say, I wonder why my gcc doesn't implement a `uint_least0_t`... – autistic Mar 12 '13 at 15:08
  • ... and whether the `sizeof` such an object could be 0, if it were to exist. :P – autistic Mar 12 '13 at 15:10
  • Some compilers do support a `bit` type, which does indeed take one bit of storage, but do not allow pointers to such a type. Such types are particularly useful on embedded processors which include instructions which can directly test bits of RAM; some such processors can access any part of RAM with bit-test instructions, while others have a 16-byte area that can be used to hold up to 128 bit flags. – supercat Jan 08 '14 at 18:14
1

Although question was asked long time back, there's no answer with explaination of your result. I'll answer it, hopefully it'll be useful to someone.

I'll illustrate the bug using first 16 bits of your data structure.

Please Note: This explaination is guarranteed to be true only with the set of your processor and compiler. If any of these changes, behaviour may change.

Fields:

unsigned int ver   : 4;
unsigned int hlen  : 4;
unsigned int stype : 8;

Assigned to:

dgram.ver   = 4;
dgram.hlen  = 5;
dgram.stype = 0;

Compiler starts assigning bit fields starting with offset 0. This means first byte of your data structure is stored in memory as:

Bit offset: 7     4     0
            -------------
            |  5  |  4  |
            -------------

First 16 bits after assignment look like this:

Bit offset:     15   12    8     4     0
                -------------------------
                |  5  |  4  |  0  |  0  |
                -------------------------
Memory Address: 100          101

You are using Unsigned 16 pointer to dereference memory address 100. As a result address 100 is treated as LSB of a 16 bit number. And 101 is treated as MSB of a 16 bit number.

If you print *ptr in hex you'll see this:

*ptr = 0x0054

Your loop is running on this 16 bit value and hence you get:

00000000 0101 0100
-------- ---- ----
   0       5    4

Solution: Change order of elements to

unsigned int hlen  : 4;
unsigned int ver   : 4;
unsigned int stype : 8;

And use unsigned char * pointer to traverse and print values. It should work.

Please note, as others've said, this behavior is platform and compiler specific. If any of these changes, you need to verify that memory layout of your data structure is correct.

0

For Chinese users, I think you can refer blog for more details, really good.

In summary, due to endianness, there is byte order as well as bit order. Bit order is the order how each bit of one byte saved in memory. Bit order has same rule with byte order in sense of endianness issue.

For your picture, it's designed in network order which is big endian. So your struct defination is actually for big endian. Per your output, your PC is little endian, so you need change struct field orders when use.

The way to show each bits is incorrect since when get by char, the bit order has changed from machine order (little endian in your case) to normal order which we human use. You may change it as following per refered blog.

void
dump_native_bits_storage_layout(unsigned char *p, int bytes_num)
{

    union flag_t {
        unsigned char c;
        struct base_flag_t {
            unsigned int p7:1,
                        p6:1,
                        p5:1,
                        p4:1,
                        p3:1,
                        p2:1,
                        p1:1,
                        p0:1;
        } base;
    } f;

    for (int i = 0; i < bytes_num; i++) {
        f.c = *(p + i);
        printf("%d%d%d%d %d%d%d%d ",
                        f.base.p7,
                        f.base.p6, 
                        f.base.p5, 
                        f.base.p4, 
                        f.base.p3,
                        f.base.p2, 
                        f.base.p1, 
                        f.base.p0);
    }
    printf("\n");
}

//prints 16 bits at a time
void print_dgram(struct ip_dgram dgram)
{
    unsigned char* ptr = (unsigned short int*)&dgram;
    int i,j;
    //print only 10 words
    for(i=0 ; i<10 ; i++)
    {
        dump_native_bits_storage_layout(ptr, 1);
        /* for(j=7 ; j>=0 ; j--)
        {
            if( (*ptr) & (1<<j) ) printf("1");
            else printf("0");
            if(j%8==0)printf(" ");
        }*/
        ptr++;
        //printf("\n");
    }
}
Evan
  • 1
-1

@unwind

A typical use case of Bit Fields is interpreting/emulation of byte code or CPU instructions with given layout. "Don't use it, because you cannot control it" is the answer for children.

@Bruce

For Intel/GCC I see a packed LITTLE ENDIAN bit layout, i.e. in struct ip_dgram field ver is represented by bits 0..3, field hlen is represented by bits 4..7 ...

For correctness of operation it is required to verify the memory layout against your design at runtime.

struct ModelIndicator
{
    int a:4;
    int b:4;
    int c:4;
};

union UModelIndicator
{
    ModelIndicator i;
    int v;
};

// test packed little endian
static bool verifyLayoutModel()
{
    UModelIndicator um;
    um.v = 0;
    um.i.a = 2; // 0..3
    um.i.b = 3; // 4..7
    um.i.c = 9; // 8..11
    return um.v == (9 << 8) + (3 << 4) + 2;
}

int main()
{
    if (!verifyLayoutModel())
    {
        std::cerr << "Invalid memory layout" << std::endl; 
        return -1;
    }
    // ...
}

At the earliest, when above test fails, you need to consider compiler pragmas or adjust your structures accordingly, resp. verifyLayoutModel().

Sam Ginrich
  • 661
  • 6
  • 7
  • 1
    *"Don't use it, because you cannot control it" is the answer for children.* is the ignorantly arrogant answer from someone who fails to account for things not under control and therefore needlessly opens the door to future bugs. Have higher standards than that. – Andrew Henle Aug 09 '23 at 11:00
-2

I agree with what unwind said. Bit fields are compiler dependent.

If you need the bits to be in a specific order, pack the data into a pointer to a character array. Increment the buffer the size of the element being packed. Pack the next element.

pack( char** buffer )
{
   if ( buffer & *buffer )
   {
     //pack ver
     //assign first 4 bits to 4. 
     *((UInt4*) *buffer ) = 4;
     *buffer += sizeof(UInt4); 

     //assign next 4 bits to 5
     *((UInt4*) *buffer ) = 5;
     *buffer += sizeof(UInt4); 

     ... continue packing
   }
}
Kat
  • 447
  • 2
  • 7
  • 21
  • Are you trying to suggest that `sizeof(UInt4)` could be somewhere between 0 and 1, like 0.5, because a UInt4 is "half of a byte"? Roffle. I wish I could up-vote you for that! – autistic Mar 11 '13 at 23:54
  • Supposing `CHAR_BIT == 9`, does this imply that the *integer value* `sizeof(UInt4)` is an *irrational fraction*? – autistic Mar 12 '13 at 00:08
  • 1
    Individual bits are not addressable. The minimum addressable entity in C is char (aka byte), which usually has 8 bits. – WGH Apr 03 '14 at 09:23
-2

Compiler dependant or not, It depends whether you want to write a very fast program or if you want one that works with different compilers. To write for C a fast, compact application, use a stuct with bit fields/. If you want a slow general purpose program , long code it.

Larry
  • 11
  • Hi, this isn't a very specific answer and you don't back up your arguments with anything. Can you add some examples/proof in? – Ben Jan 04 '14 at 23:08
  • An optimisation can't be an optimisation if it's incorrect. You first need to define the correct behaviour. If you think you can optimise it, create a layer of abstraction so that you can substitute the more optimal version in favour of the correct *"slower"* version... Speaking of that, there's nothing in the standard that says "portable and correct is slow", or "non-portable and undefined is fast" for that matter. A compiler somewhere out there might produce *the same code* for both versions, or optimise it away entirely if the compiler can deduce that the logic is unused or unnecessary. – autistic Jun 02 '18 at 18:35