Is a 2-member struct a safe replacement for a bit-packed int?

Question

I have some existing C++ code that sends an receives an array of uint32_t over the network. Due to a change in my protocol, I want to replace each entry in this array with a pair of two uint16_ts, and if possible I would like to do this without changing the number of bits I send over the network. An obvious way to combine two uint16_t values into a single 32-bit-wide value is to do low-level bit-packing into a uint32_t, and leave the array definition unchanged. So the sender's code would look like this:

uint32_t items[ARR_SIZE];
for(std::size_t i = 0; i < ARR_SIZE; ++i) {
    //get uint16_t field1 and field2 somehow
    items[i] = static_cast<uint32_t>(static_cast<uint32_t>(field2) << 16)
                   | static_cast<uint32_t>(field1));
}

And the receiver's code would look like this:

//receive items
for(std::size_t i = 0; i < ARR_SIZE; ++i) {
    uint16_t field1 = static_cast<uint16_t>(items[i] & 0xffff);
    uint16_t field2 = static_cast<uint16_t>(items[i] >> 16);
    //do something with field1 and field2
}

However, this is ugly, type-unsafe, and relies on hard-coded magic numbers. I wonder if it is possible to accomplish the same thing by defining a 2-member struct that "should" be exactly the same size as a uint32_t:

struct data_item_t {
    uint16_t field1;
    uint16_t field2;
};

Then, the sender's code would look like this:

data_item_t items[ARR_SIZE];
for(std::size_t i = 0; i < SIZE; ++i) {
    //get uint16_t field1 and field2 somehow
    items[i] = {field1, field2};
}

And the receiver's code would look like this:

//receive items
for(std::size_t i = 0; i < ARR_SIZE; ++i) {
    uint16_t curr_field1 = items[i].field1;
    uint16_t curr_field2 = items[i].field2;
    //do something with field1 and field2
}

Will this work equivalently to the bit-packed uint32_ts? In other words, will the items array contain the same bits when I use struct data_item_t as when I use uint32_t and bit-packing? Based on the rules of structure padding, I think a struct containing two uint16_ts will never need any internal padding to be properly aligned. Or is that actually up to my compiler, and I need something like __attribute__((__packed__)) to guarantee it?

While I would throw out any implementation that doesn't make `data_item_t` the same size as a `uint32_t`, they are legally allowed to add padding. — NathanOliver, May 20 '19 at 18:58
interstingly you use some sort of big endian on the bit shifting and little endian on the struct. Make sure sender and receiver use the same ordering. — CAF, May 21 '19 at 05:53
@CAF Oh, you're right, if I want the struct and bit shift to be equivalent layout I should put field2 in the high-order bits of the uint32, not the low-order bits. I fixed my example. — Edward, May 23 '19 at 17:58

score 3 · Answer 1 · answered May 20 '19 at 19:02

There shouldn't be any issue of implementation-defined padding, however depending on endianness there will be differences between the representations. Note also that alignment will be different - this becomes relevant for example if you embed your values in another struct.

More generally, it's not clear what level of protocol compatibility you're attempting to achieve. I suggest you either decide that you are going to allow breaking protocol compatibility from version to version, or you very explicitly lay down your protocol in a way that's extensible and versioned such that different versions of the software can communicate. In this case you should design the protocol so that it's well-defined independently of your C++ implementation, and write your send/receive code in a byte-by-byte style to avoid endianness problems.

I don't see what trying to maintain equal data size while changing representation achieves at all.

I see, I could have been more clear about my goals. I'm not looking for protocol backwards compatibility in terms of letting an earlier version of the software communicate with a later version. I just want the new version of my protocol to send the same number of bytes as the old version, rather than introduce unnecessary padding. — Edward, May 23 '19 at 18:01

score 1 · Answer 2 · answered May 20 '19 at 19:24

This is ugly, type-unsafe, and relies on hard-coded magic numbers.

It is a well-known idiom, and it is one of the reasons we got bit manipulation operators since C. There is no "magic" in those numbers.

Another option is to call std::memcpy as long as you know your endianness. This is also easier to generalize, if that is your concern.

I wonder if it is possible to accomplish the same thing by defining a 2-member struct that "should" be exactly the same size as a uint32_t.

Not with a 2-member struct, but you may do it using an array of 2 uint16_ts -- that will guarantee no padding between them.

You can also, instead, use 2-members as you want, but assert that the size is the minimum. At least that way you are guaranteed it will work if it compiles (which will, in most platforms nowadays):

static_assert(sizeof(T) == 2 * sizeof(std::uint16_t));

Will this work equivalently to the bit-packed uint32_ts? In other words, will the items array contain the same bits when I use struct data_item_t as when I use uint32_t and bit-packing?

No, the compiler may add padding.

Or is that actually up to my compiler, and I need something like __attribute__((__packed__)) to guarantee it?

That is the raison d'être of that attribute (for different types, in particular). :-)

score 1 · Accepted Answer · answered May 23 '19 at 18:22

Just write proper accessors:

struct data_item_t {
    uint32_t field;
    uint16_t get_field1() const { return field; }
    uint16_t get_field2() const { return field >> 16; }
    void set_field1(uint16_t v) { field = (field & 0xffff0000) | v; }
    void set_field2(uint16_t v) { field = (field & 0x0000ffff) | v << 16; }
};
static_assert(std::is_trivially_copyable<data_item_t>::value == true, "");
static_assert(sizeof(data_item_t) == sizeof(uint32_t), "");
static_assert(alignof(data_item_t) == alignof(uint32_t), "");

The is_trivially_copyable is in place, so you can memcpy or memmove the class as much as you want. So receiving it via some api that uses pointers to char, unsigned char or std::byte will be valid.

The compiler can insert padding everywhere except in front the first member. So even with one field, it can insert padding on the end of the struct - and probably we could find a strange implementation, where sizeof(data_item_t) == sizeof(uint64_t). The proper way to go about it, is to write proper static_assertions.

Is a 2-member struct a safe replacement for a bit-packed int?

3 Answers3