25

There are many situations (especially in low-level programming), where the binary layout of the data is important. For example: hardware/driver manipulation, network protocols, etc.

In C++ I can read/write arbitrary binary structures using char* and bitwise operations (masks and shifts), but that's tedious and error-prone. Obviously, I try to limit the scope of these operations and encapsulate them in higher-level APIs, but it's still a pain.

C++ bitfields seem to offer a developer-friendly solution to this problem, but unfortunately their storage is implementation specific.

NathanOliver mentionned std::bitset which basically allows you to access individual bits of an integer with a nice operator[] but lacks accessors for multi-bit fields.

Using meta-programming and/or macros, it's possible to abstract the bitwise operations in a library. Since I don't want to reinvent the wheel, I'm looking for a (preferably STL or boost) library that does that.

For the record, I'm looking into this for a DNS resolver, but the problem and its solution should be generic.

Edit: short answer: it turns out bitfield's storage is reliable in practice (even if it's not mandated by the standard) since system/network libraries use them and yeild well behaved programs when compiled with mainstream compilers.

Community
  • 1
  • 1
Antoine
  • 13,494
  • 6
  • 40
  • 52
  • I think you need to write your own functions to deal with it (and, or, ...) –  Jul 30 '15 at 14:17
  • 2
    Sometimes I approach this by abstracting the bit manipulations at compile time using some metaprogramming. – shuttle87 Jul 30 '15 at 14:18
  • @Kilanny: yeah, that's what I usually do, but just wondering if there's a way to make this kind of code more maintainable – Antoine Jul 30 '15 at 14:19
  • @shuttle87: interesting, I'm currently playing with this idea, but I'm trying to avoid reinventing the wheel ;) – Antoine Jul 30 '15 at 14:23
  • 2
    Have you looked at a [`std::bitset`](http://en.cppreference.com/w/cpp/utility/bitset) – NathanOliver Jul 30 '15 at 14:27
  • @NathanOliver: +1 seems better than bitfields. Convenient access to individual bits, but lacks accessors for fields of more than one bit (for example a 4-bit int positioned at bits 3..7) – Antoine Jul 30 '15 at 14:31
  • 2
    Was going to answer this but it got closed, maybe I should write a blog entry about how I actually did this in an embedded project I worked on. – shuttle87 Jul 30 '15 at 14:31
  • @Antoine I don't think it would be that hard to write a function to return a range of bits. I know it doesn't do any good to you right now but you could suggest it to the standard committee. – NathanOliver Jul 30 '15 at 14:33
  • @NathanOliver: thanks, I'll write my own. I was just asking for best practices, but well ~ – Antoine Jul 30 '15 at 14:36
  • 3
    Off-topic ? Since when are "how do I do that better" questions off-topic ? – Quentin Jul 30 '15 at 14:56
  • @Quentin I took `So this left me wondering if there was a nice C++ language-feature or library (preferably STL or boost) for reading/writing bitfields ?` as asking for `recommend or find a book, tool, software library, tutorial or other off-site resource` – NathanOliver Jul 30 '15 at 15:06
  • 1
    @NathanOliver (and others) (IMHO) Questions receiving this VTC reason are usually very broad questions ("Where do I learn X ?"), that can't be answered in a self-contained way. This question is tightly focused, and could be answered with ad-hoc code, standard functionality, Boost, or another library. Given the relatively small problem at hand, I'd say that self-contained ad-hoc answers are likely, which is good. In fact, I have one such answer. – Quentin Jul 30 '15 at 15:16
  • It shouldn't be difficult to implement it using C++... for example something like `BitField` being a bit field allocated in bit 3...14 of an `unsigned long` (where the constructor should provide the address of the unsigned long) and supporting assignment from integers and implicit conversion to integer. – 6502 Jul 30 '15 at 15:21
  • @shuttle87 -- question was reopened -- maybe you could put your answer in as well. – Soren Jul 30 '15 at 19:50

6 Answers6

9

From the C++14 standard (N3797 draft), section 9.6 [class.bit], paragraph 1:

Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [ Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. — end note ]

Although notes are non-normative, every implementation I'm aware of uses one of two layouts: either big-endian or little endian bit order.

Note that:

  • You must specify padding manually. This implies that you must know the size of your types (e.g. by using <cstdint>).
  • You must use unsigned types.
  • The preprocessor macros for detecting the bit order are implementation-dependent.
  • Usually the bit order endianness is the same as the byte order endianness. I believe there is a compiler flag to override it, though, but I can't find it.

For examples, look in netinet/tcp.h and other nearby headers.

Edit by OP: for example tcp.h defines

struct
{
    u_int16_t th_sport;     /* source port */
    u_int16_t th_dport;     /* destination port */
    tcp_seq th_seq;     /* sequence number */
    tcp_seq th_ack;     /* acknowledgement number */
# if __BYTE_ORDER == __LITTLE_ENDIAN
    u_int8_t th_x2:4;       /* (unused) */
    u_int8_t th_off:4;      /* data offset */
# endif
# if __BYTE_ORDER == __BIG_ENDIAN
    u_int8_t th_off:4;      /* data offset */
    u_int8_t th_x2:4;       /* (unused) */
# endif
    // ...
}

And since it works with mainstream compilers, it means bitset's memory layout is reliable in practice.

Edit:

This is portable within one endianness:

struct Foo {
    uint16_t x: 10;
    uint16_t y: 6;
};

But this may not be because it straddles a 16-bit unit:

struct Foo {
    uint16_t x: 10;
    uint16_t y: 12;
    uint16_t z: 10;
};

And this may not be because it has implicit padding:

struct Foo {
    uint16_t x: 10;
};
ismail
  • 46,010
  • 9
  • 86
  • 95
o11c
  • 15,265
  • 4
  • 50
  • 75
  • Are you saying all mainstream compilers pack bitfields tightly (and fields may straddle bytes), and the optimization level has no impact on the binary layout of bitfields ? Have you checked yourself or can you link to some information pointing your way ? – Antoine Jul 30 '15 at 15:25
  • 2
    Yes, provided you don't straddle the unit boundary, and pad out to that size. If optimization affected struct layout, the whole universe would burn. – o11c Jul 30 '15 at 15:27
  • 2
    Indeed, the `tcp.h` header convinced me, actually bitfields are usable even if there are many overly cautious posts about them here on SO. – Antoine Jul 30 '15 at 15:37
  • I believe the standard is talking about the order of bitfields not the order of bits. – HCSF Mar 28 '19 at 12:25
  • @HCSF *All* implementations choose additional restrictions beyond what the standard requires. This is one of the things you can rely on. – o11c Mar 28 '19 at 17:40
  • "You must use unsigned types" is incorrect since C++14. Prior to then, it was up to the implementation to determine the whether it treated signed types as unsigned; since then, an implementation must respect the specified signedness. – cosimo193 Dec 14 '22 at 15:05
5

It's simple to implement bit fields with known positions with C++:

template<typename T, int POS, int SIZE>
struct BitField {
    T *data;

    BitField(T *data) : data(data) {}

    operator int() const {
        return ((*data) >> POS) & ((1ULL << SIZE)-1);
    }

    BitField& operator=(int x) {
        T mask( ((1ULL << SIZE)-1) << POS );
        *data = (*data & ~mask) | ((x << POS) & mask);
        return *this;
    }
};

The above toy implementation allows for example to define a 12-bit field in a unsigned long long variable with

unsigned long long var;

BitField<unsigned long long, 7, 12> muxno(&var);

and the generated code to access the field value is just

0000000000000020 <_Z6getMuxv>:
  20:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax  ; Get &var
  27:   48 8b 00                mov    (%rax),%rax     ; Get content
  2a:   48 c1 e8 07             shr    $0x7,%rax       ; >> 7
  2e:   25 ff 0f 00 00          and    $0xfff,%eax     ; keep 12 bits
  33:   c3                      retq   

Basically what you'd have to write by hand

6502
  • 112,025
  • 15
  • 165
  • 265
  • 1
    You can avoid the pointer/memory overhead by putting them in a union. Also, you should use `size_t` and `T` instead of `int` in places, and `static_assert(std::is_unsigned::value, "bitfields must be unsigned to be sane"). – o11c Jul 30 '15 at 18:34
5

We have this in production code where we had to port MIPS code to x86-64

https://codereview.stackexchange.com/questions/54342/template-for-endianness-free-code-data-always-packed-as-big-endian

Works well for us.

It's basically a template without any storage, the template arguments specify the position of the relevant bits.

If you need multiple fields, you put multiple specializations of the template together in a union, together with an array of bytes to provide storage.

The template has overloads for assignment of value and a conversion operator to unsigned for reading the value.

In addition, if the fields are larger than a byte, they are stored in big-endian byte order, which is sometimes useful when implementing cross-platform protocols.

here's a usage example:

union header
{
    unsigned char arr[2];       // space allocation, 2 bytes (16 bits)

    BitFieldMember<0, 4> m1;     // first 4 bits
    BitFieldMember<4, 5> m2;     // The following 5 bits
    BitFieldMember<9, 6> m3;     // The following 6 bits, total 16 bits
};

int main()
{
    header a;
    memset(a.arr, 0, sizeof(a.arr));
    a.m1 = rand();
    a.m3 = a.m1;
    a.m2 = ~a.m1;
    return 0;
}
CplusPuzzle
  • 298
  • 2
  • 9
  • While this is a clever solution, which I have seen at a client site. The client used an integral type different from unsigned char as the first union member. Which actually results in undefined behavior, because only standard-layout-struct members are compatible. I would need to look up the details if overlaying with a char-array is not undefined behavior. – PeterSom Mar 09 '22 at 13:23
  • @PeterSom In addition to your comment, it is also undefined behaviour to read a field from a union if it was not the last field written, hence, on the line "a.m2 = ~a.m1;", the read of a.m1 is UB since the last field written was a.m3. Using this method relies on undefined behaviour, using bit-fields relies on implementation defined behaviour. – cosimo193 Dec 14 '22 at 15:09
  • @cosimo193, technically it only reads and writes to the char array so it's not a problem. the assignment is overloaded and the reading is through an overloaded conversion operator to unsigned. – CplusPuzzle Mar 02 '23 at 16:01
1

I have written an implementation of bit fields in C++ as a library header file. An example I give in the documentation is that, instead of writing this:

struct A
  {
    union
      {
        struct
          {
            unsigned x : 5;
            unsigned a0 : 2;
            unsigned a1 : 2;
            unsigned a2 : 2;
          }
        u;
        struct
          {
            unsigned x : 5;
            unsigned all_a : 6;
          }
        v;
      };
  };

// …

A x;
x.v.all_a = 0x3f;
x.u.a1 = 0;

you can write:

typedef Bitfield<Bitfield_traits_default<> > Bf;

struct A : private Bitfield_fmt
  {
    F<5> x;
    F<2> a[3];
  };

typedef Bitfield_w_fmt<Bf, A> Bwf;

// …

Bwf::Format::Define::T x;
BITF(Bwf, x, a) = 0x3f;
BITF(Bwf, x, a[1]) = 0;

There's an alternative interface, under which the last two lines of the above would change to:

#define BITF_U_X_BWF Bwf
#define BITF_U_X_BASE x
BITF(X, a) = 0x3f;
BITF(X, a[1]) = 0;

Using this implementation of bit fields, the traits template parameter gives the programmer a lot of flexibility. Memory is just processor memory by default, or it can be an abstraction, with the programmer providing functions to perform "memory" reads and writes. The abstracted memory is a sequence of elements of any unsigned integral type (chosen by the programmer). Fields can be laid out either from least-to-most or most-to-least significance. The layout of fields in memory can be the reverse of what they are in the format structure.

The implementation is located at: https://github.com/wkaras/C-plus-plus-library-bit-fields

(As you can see, I unfortunately was not able to fully avoid use of macros.)

WaltK
  • 724
  • 4
  • 13
0

I have created a library for that:

Portable Bitfields

It works similar to the solution provided by @CpusPuzzle.

Basic example:

enum class Id
{
    f1, f2, f3
};

using namespace jungles;

using Register = Bitfields<
    uint16_t, 
    Field{.id = Id::f1, .size = 3},
    Field{.id = Id::f2, .size = 9}, 
    Field{.id = Id::f3, .size = 4}>; 

r.at<Id::f1>() = 0b101;
r.at<Id::f2>() = 0b001111100;
r.at<Id::f3>() = 0b0110;

ASSERT(r.extract<Id::f1>() == 0b1010000000000000);
ASSERT(r.extract<Id::f2>() == 0b0000011111000000);
ASSERT(r.extract<Id::f3>() == 0b0000000000000110);

ASSERT(r.serialize() == 0b1010011111000110);

Deserialization:

Register r{0b0101110001110110};
//           XXXYYYYYYYYYZZZZ
ASSERT(r.at<Id::f1>() == 0b010);
ASSERT(r.at<Id::f2>() == 0b111000111);
ASSERT(r.at<Id::f3>() == 0b0110);
K. Koovalsky
  • 596
  • 4
  • 17
-2

C is designed for low-level bit manipulation. It's easy enough to declare a buffer of unsigned chars, and set it to any bit pattern you want. Especially if your bit strings are very short so fit into one of the integral types.

One potential problem is byte endiannness. C can't "see" this at all, but just as integers have an endianness, so too do bytes, when serialised. Another is the very small number of machines that don't use octets for bytes. C guarantees a byte shall be at least an octet, but 32 and 9 are real-world implementations. In those circumstances, you have to take a decision whether to simply ignore the uper bits (in which case naive code should work), or treat them as part of the bitstream (in which case you've got to be careful to fold CHAR_BIT into your calculations). It's also hard to test the code as you unlikely to find it easy to get your hands on a CHAR+BIT 32 machine.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18