-2

So per the C compiler standard here:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf

We find a failure to pin down requirements for exactly how bit-fields get implemented inside a C compiler. Apparently, as long as the bit-fields behave like any other scalar field, anything goes. The doc section 6.7.2.1-10 says:

"An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified."

This looming freedom for the compiler seems to be a full-stop for many who claim "you cannot trust bit-fields", or "bit-fields are not portable." The alarm suggests a whole herd of compiler writers and CPU makers conspiring in a star chamber, just grinning and anxious to do some exotic bit-field sizing and alignments simply because the standard permits.

WHERE IS THE EVIDENCE for these crazy bohemian compiler/CPU designers who are dedicated to guaranteeing that bit-fields remain forever undependable and unportable? I want to see actual hard evidence of the green-men on mars.

I've attached straightforward C++ source code to tell the bit-field truth about any system with a C++ compiler. I am asking the community, NOT for opinion, but for hard output evidence for your system and compiler if it diverges from the results posted. If I had the ability to poll the entire C/C++ community for same/not-same vote compared to the posted results, I wonder what the percentages would be?

#include <stdio.h>

/*
 A simple program to illustrate the bitfields actual internal compiled layout.
 Results depend on machine architecture and compiler implementation and flags.
*/

typedef unsigned long long int ulli;

struct bitf
{
    //   field      bits  offset
    ulli f0         : 1; // 0
    ulli f1         : 2; // 1
    ulli f3         : 3; // 3
    ulli f7         : 4; // 6
    ulli f15        : 5; // 10
    ulli f31        : 6; // 15
    ulli f63        : 7; // 21
    ulli f127       : 8; // 28
    ulli f255       : 9; // 36
    ulli f511       :10; // 45
    ulli end        : 9; // 55
                         // 64

    bitf():
         f0         ( 0 )
        ,f1         ( 1 )
        ,f3         ( 3 )
        ,f7         ( 7 )
        ,f15        ( 15 )
        ,f31        ( 31 )
        ,f63        ( 63 )
        ,f127       ( 127 )
        ,f255       ( 255 )
        ,f511       ( 511 )
        ,end        ( 0 )
    {}

    ulli get_shft() const
    {
        ulli bits=0;
        bits <<= 9; bits |=   0;
        bits <<=10; bits |= 511;
        bits <<= 9; bits |= 255;
        bits <<= 8; bits |= 127;
        bits <<= 7; bits |=  63;
        bits <<= 6; bits |=  31;
        bits <<= 5; bits |=  15;
        bits <<= 4; bits |=   7;
        bits <<= 3; bits |=   3;
        bits <<= 2; bits |=   1;
        bits <<= 1; bits |=   0;
        return bits;
    }

    ulli get_cast() const
    {
        ulli bits = *((ulli*)(this));
        return bits;
    }
};

int main()
{
    bitf bf;
    ulli shft = bf.get_shft();
    ulli cast = bf.get_cast();

    printf("sizeof(ulli) is %zu\n\n",sizeof(ulli));
    printf("shft%scast\n\n",(shft==cast)?"==":"!=");
    printf("BITS from MSB 63 (left) down to LSB 0 (right)\n");
    printf("    : "); for(int i=63; i>=0; i--) printf("%c",(i%10)==0 ? i/10 +'0' : ' '); printf("\n");
    printf("    : "); for(int i=63; i>=0; i--) printf("%d",i%10); printf("\n");
    printf("shft: "); for(int i=63; i>=0; i--) printf("%llu",(shft>>i)&1); printf("\n");
    printf("cast: "); for(int i=63; i>=0; i--) printf("%llu",(cast>>i)&1); printf("\n");
    printf("    : ====----====----====----====----====----====----====----====----\n");
    printf("shft: "); for(int i=15;i>=0;i--) printf("%4llx",(shft>>(i*4)&0xf)); printf("\n");
    printf("cast: "); for(int i=15;i>=0;i--) printf("%4llx",(cast>>(i*4)&0xf)); printf("\n");
    printf("    : ====----====----====----====----====----====----====----====----\n");
    unsigned char *pb;
    pb = (unsigned char*)(&shft);
    printf("shft: "); for(int i=sizeof(shft)-1; i>=0; i--) printf("%8x", pb[i]); printf("\n");
    pb = (unsigned char*)(&cast);
    printf("cast: "); for(int i=sizeof(cast)-1; i>=0; i--) printf("%8x", pb[i]); printf("\n");
    printf("\n");

    printf("<ENTER>"); getchar();
    return 0;
}

Results for Intel Core i7, Win10, VS2015, 64bit build

sizeof(ulli) is 8

shft==cast

BITS from MSB 63 (left) down to LSB 0 (right)
    :    6         5         4         3         2         1         0
    : 3210987654321098765432109876543210987654321098765432109876543210
shft: 0000000000111111111011111111011111110111111011111011110111011010
cast: 0000000000111111111011111111011111110111111011111011110111011010
    : ====----====----====----====----====----====----====----====----
shft:    0   0   3   f   e   f   f   7   f   7   e   f   b   d   d   a
cast:    0   0   3   f   e   f   f   7   f   7   e   f   b   d   d   a
    : ====----====----====----====----====----====----====----====----
shft:        0      3f      ef      f7      f7      ef      bd      da
cast:        0      3f      ef      f7      f7      ef      bd      da

<ENTER>
RBornert
  • 21
  • 2
  • 2
    The code does not compile in C (MSVC). – Weather Vane Jul 17 '19 at 17:29
  • @WeatherVane: It's not supposed to. The OP said it's C++. Now, why the OP feels the need to put C++ in a question about C is a different matter. – Nicol Bolas Jul 17 '19 at 17:33
  • 2
    @NicolBolas it has the C tag. There is no C/C++ language as mentioned in the post, or entire C/C++ community. – Weather Vane Jul 17 '19 at 17:35
  • 1
    Compiler Explorer (gcc.godbolt.org) has a nice selection of compilers/platforms. Why not test it there and post the results? I would upvote. – Petr Skocik Jul 17 '19 at 17:35
  • :) Well it is C++ code, so you'd need to convert it to C. – RBornert Jul 17 '19 at 17:36
  • 1
    *I've attached straightforward C++ source code to tell the bit-field truth* How many different compilers did you use? On how many different systems? How many of those systems were big-endian? How many of those systems were little-endian? From what you've posted, your "truth" applies to using one compiler on one operating system running on little-endian hardware. That's no "truth". If you want to make an assertion, do the work to support it. – Andrew Henle Jul 17 '19 at 17:37
  • In general the OP question applies to both C and C++, that is, unless somebody can point to a C++ spec where all of the ambiguity cited in the C standard is settled. – RBornert Jul 17 '19 at 17:38
  • 4
    @RBornert: "*This looming freedom for the compiler seems to be a full-stop for many who claim "you cannot trust bit-fields", or "bit-fields are not portable."*" I'm not sure I understand your point. Or more specifically, it's not clear exactly what these objections are referring to. You're basically claiming that "some people" have said "some thing", in some context you have not provided, and then dared people to find practical examples of that "some thing" being the case. Bitfields certainly are portable, in that *valid* code written against them will work across platforms. – Nicol Bolas Jul 17 '19 at 17:39
  • 3
    The (non) portability of bitfields is often misunderstood. Bitfields are just about exactly as portable as ordinary ints. If you're describing data structures to be used within a program, they're quite portable and can be used perfectly portably. But if you're describing data formats to be used for interchange, read/written to files or network streams, there are significant portability difficulties. These can generally be overcome (for bitfields as well as for ordinary ints), although ugly ifdefs may be required. – Steve Summit Jul 17 '19 at 17:54
  • @AndrewHenle You're more or less making my point. You're trying to burden me with questing to find a bunch of exotic architecture. I can't and not for trying. Big endian is not as common, now, as the caution suggests. In the early days of computing, Big endian was more common. How many members here will ACTUALLY post output results that diverge from what I posted? That any of us can consider the possibility of Big Foot is not the same as actually finding one right now. :) Opinions about Big Foot are easy. Finding one not so much these days. – RBornert Jul 17 '19 at 18:06
  • 1
    @RBornert "I don't think that will happen, so it can be ignored"? How's that working for Boeing? You're also assuming bit-field layout is only influenced by endianness. That's wrong, too. "Implementation-defined" means the compiler gets to choose - different compilers on the same platform can do it differently. Writing robust, reliable code means you discount **nothing**. – Andrew Henle Jul 17 '19 at 18:17
  • I'm someone who has said "[don't use bitfields](https://stackoverflow.com/questions/53118858/bit-manipulations-good-practices/53119001#53119001)". But you're probably right: there's probably a vast majority of little-endian machines out there that all do bitfields the same way. If you want to assume that architecture, you probably won't get in too much trouble, and no one here is going to stop you. (It sounds like your mind's basically made up already, anyway.) – Steve Summit Jul 17 '19 at 18:22
  • @SteveSummit *Bitfields are just about exactly as portable as ordinary ints* I'll have to disagree with that. Compilers aren't free change the ordering of integral values in a structure. They can with bit-fields. – Andrew Henle Jul 17 '19 at 18:27
  • @AndrewHenle I agree with you in the case where the developer does not have any control over the nature of the software at a network endpoint. If you control the compiler and CPU architecture for network endpoints, then bit-field endianness is NOT an issue. I'd like to point out that Microsoft did a decent job of removing the ambiguity for bit field sizing and alignment. The nature of the output I posted is evidence of such. Also, GNU has compiler flags that conform to Microsoft normalization. – RBornert Jul 17 '19 at 18:29
  • @RBornert You're also assuming what you have control over now you'll remain in control of in the future. The output currently posted doesn't even establish that Microsoft treats bit-fields consistently. Barring bugs every compiler pretty much **must** do that - you just don't prove it here as you compile your code *once*. You didn't document testing it with different compiler options, 32-bit vs. 64-bit, different versions of the compiler, different versions of the OS, yet you're claiming you've removed the ambiguity? – Andrew Henle Jul 17 '19 at 18:38
  • @AndrewHenle If you use any recent Microsoft compiler on Intel or AMD systems, you will get the posted output for both 32bit and 64bit builds. And in fact I've been writing C source code since 1987, C++ since 1990. Not once have I seen any C/C++ compiler NOT pack bit-fields exactly like the posted output. Where is the evidence for any C/C++ compiler that is packing bit-fields differently? Show me the evidence please. – RBornert Jul 17 '19 at 18:48
  • 1
    @RBornert *If you use any recent Microsoft compiler on Intel or AMD systems, you will get the posted output for both 32bit and 64bit builds.* If that's the entirety of your universe, fine. But you're trying to imply what works there will work everywhere, and it's up to us to actually prove it won't. I'd say you have that backwards. I'd rather write code based on assurances that things **will** work no matter what and therefore prefer strictly-compliant C code that doesn't rely on implementation-defined behavior. – Andrew Henle Jul 17 '19 at 19:05
  • https://learn.microsoft.com/en-us/cpp/cpp/cpp-bit-fields?view=vs-2019 : "The ordering of data declared as bit fields is from low to high bit, as shown in the figure above." https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Variable-Attributes.html : "-mms-bitfields" flag will have the compiler produce bitfields per Microsofts low to high packing. – RBornert Jul 17 '19 at 19:18
  • @AndrewHenle Ok let us say "my world" is limited to Windows systems and Linux systems. Where is the evidence that I need to worry about anything? MS has removed all ambiguity about bit-packing and the gnu compiler will conform to what MS is doing. So these 2 environments cover what percentage of computers right now? I'm not claiming there are zero Big Foot. I'm claiming that the evidence for needing to care about Big Foot is disappearing. – RBornert Jul 17 '19 at 19:25
  • 1
    @RBornert I don't think we're going to hear a collective forehead slap, from all of the commenters here, saying "Zounds! You're right! Bitfields are no longer a portability issue! We should all use them (with assumptions about their ordering) with wild abandon now!" Equally, I don't think we're going to convince you that bitfield portability is still something to be concerned about. So I'm not seeing where this discussion is going. – Steve Summit Jul 17 '19 at 20:01
  • @SteveSummit Let's agree that for OS source code and compiler source code and open network packet exchange source code, no forehead "Zounds" will be happening. However, for application layer developers who have a lot of control over target run environments including hardware and compilers, I cannot find any reason to not use bit-fields as a matter of safe predictable convenience. I will let the source code I posted stand as evidence until somebody posts a valid instance of it failing to work as expected. At a minimum, a Win/Linux developer can probably say "Zounds!" – RBornert Jul 17 '19 at 20:20

1 Answers1

5

One common way that bitfields can differ is in bit endianness. Little endian machine will have the low order bits first while big endian machines have the high order bits first.

As an example, here is the definition of struct iphdr, which models an IP header, taken from /usr/include/netinet/ip.h on a CentOS 7.2 system:

struct iphdr
  {
#if __BYTE_ORDER == __LITTLE_ENDIAN
    unsigned int ihl:4;
    unsigned int version:4;
#elif __BYTE_ORDER == __BIG_ENDIAN
    unsigned int version:4;
    unsigned int ihl:4;
#else
# error "Please fix <bits/endian.h>"
#endif
    u_int8_t tos;
    u_int16_t tot_len;
    u_int16_t id;
    u_int16_t frag_off;
    u_int8_t ttl;
    u_int8_t protocol;
    u_int16_t check;
    u_int32_t saddr;
    u_int32_t daddr;
    /*The options start here. */
  };

This struct is meant to be layered directly on top of a buffer containing a raw IP datagram at the point the IP header starts. Note that the ordering of the version and ihl fields differ depending on the endianness.

And in reference to this:

a whole herd of compiler writers and CPU makers conspiring in a star chamber, just grinning and anxious to do some exotic bit-field sizing and alignments simply because the standard permits.

Compiler writers are indeed quick to take advantage of any behavior undefined or unspecified by the standard in order to perform a wide variety of optimizations that might surprise those that think C always behaves as a thin wrapper around assembly language.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • 1
    I'll note that the code you posted implicitly assumes the compiler used will place the bit-fields in the proper locations per relevant IP standards based merely on the endianness of the machine. There's nothing in the C standard that requires that. – Andrew Henle Jul 17 '19 at 17:43
  • I hear you. Have been aware of examples like this my entire career. I understand endianness issues. This struct is evidence that the looming endianness issue is solved. The nature of my post has to do with the "actual number of Big Foot" in the forest versus "omg there might be a Big Foot" in the forest. – RBornert Jul 17 '19 at 17:43
  • 2
    @RBornert Ignoring the Big Foot that might be in the forest is how you wind up with sloppy code that gets stomped with bugs. My standards are higher than that. – Andrew Henle Jul 17 '19 at 17:45
  • @AndrewHenle On an open network with no expectation of any normalcy other then Network byte order, then I agree with you. However, for client server systems where the developer controls both sides then there is no reason to be scared of bit-fields. I've used them safely for years in a controlled predictable context. – RBornert Jul 17 '19 at 18:13
  • @RBornert Your use of ridicule (Big Foot?!? really?) and straw men ("scared of bit-fields") is based on what? – Andrew Henle Jul 17 '19 at 18:30
  • https://learn.microsoft.com/en-us/cpp/cpp/cpp-bit-fields?view=vs-2019 : "The ordering of data declared as bit fields is from low to high bit, as shown in the figure above." https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Variable-Attributes.html : "-mms-bitfields" flag will have the compiler produce bitfields per Microsofts low to high packing. – RBornert Jul 17 '19 at 19:17
  • Just found this, and to some degree, there is an attempt to say the same thing. https://softwareengineering.stackexchange.com/questions/165899/has-little-endian-won – RBornert Jul 17 '19 at 21:58
  • Wait, why is that header swapping the placement of nibbles within a byte? The specified pragma is BYTE_ORDER -- the placement of bits within a byte remains unchanged, right? – Jonathan Mayer Jul 20 '22 at 19:27
  • @JonathanMayer ... the user who posted that iphdr example did not actually understand the nature of the OP - which was not a question about why network traffic depends on NBO. The OP has to do with undermining the false idea that client data inside a network packet (which is below inside the header that user posted) must be necessarily afraid of using bitfields for the reasons cited in the OP. Bitfields are perfectly safe given a modern compiler able to #pragma pack(0) and able to align bits in the order encountered in the struct. Not saying all compilers will, but most do. – RBornert Aug 02 '23 at 20:08
  • @JonathanMayer I've seen cases where beginner programmers who know they are in a little endian client server world (windows and linux), believe they must convert all their native little endian structs to big endian before sending the struct over a socket. Somewhere they learned what NBO was and decided to falsely believe that all data for all time must be NBO when sent over the network. It's false. But that belief is out there more than you realize. I decided to make the OP and challenge the idea. Notice that not a single user met the request for hard evidence. Not one. – RBornert Aug 02 '23 at 20:09
  • Correction: #pragma pack(1) – RBornert Aug 03 '23 at 15:58