3

I want to format a packet in an existing binary protocol format (I'm writing a memcached client) in C++. In C, I can do this:

    typedef struct {
        uint8_t magic;
        uint8_t opcode;
        uint16_t keylen;
        uint8_t extlen;
        uint8_t datatype;
        uint16_t reserved;
        uint32_t bodylen;
        uint32_t opaque;
        uint64_t cas;
    } request_header;

In C++, in general, the compiler can add padding between the fields. However, the above struct is carefully laid out so that everything can be aligned with no padding, assuming n bit types only need to be aligned on n bit boundaries. So in C++, according to the standard, am I safe? Or could a conforming C++ compiler add extra padding, thwarting my ability to use this to lay out my bits?

Martin C. Martin
  • 3,565
  • 3
  • 29
  • 36
  • 1
    Padding (alignment) is done to optimize access for specific cpu architectures. So you can't (imagine how padding could be done differently to optimize for 16bit or 32bit addresses). – πάντα ῥεῖ Jan 14 '15 at 22:31
  • 1
    How would you expect that struct to be laid out on a 36-bit computer? – Barmar Jan 14 '15 at 22:32
  • 1
    @Barmar a 36-bit computer wouldn't define the `uintXX_t` types so it's an irrelevant question. – Mark Ransom Jan 14 '15 at 22:35
  • @Mark Ransom I think you mean `uint8_t`, `uint16_t`, etc. (OP's types) `uint36_t` would likely be fine on a 36-bit machine. – chux - Reinstate Monica Jan 14 '15 at 22:37
  • do not use `typedef struct { ... } foobar;` structure, especially in C++, it's evil – Slava Jan 14 '15 at 22:37
  • Writing some functional tests for it is best if this is in production. You can make sure your function is working regardless of compilers. – qqibrow Jan 14 '15 at 22:41

3 Answers3

6

This isn't worth fretting about, just let the compiler tell you that it is weird:

  static_assert(sizeof(request_header) == 24, "Unexpected packet size");
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
4

You are correct, that C++ may pad arbitrarily. From C++.11 §9.2¶14 (emphasis is mine):

Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. The order of allocation of non-static data members with different access control is unspecified (11). Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; so might requirements for space for managing virtual functions (10.3) and virtual base classes (10.1).

C is also permitted to add padding bytes, so this is not peculiar to C++. From C.11 §6.7.2.1¶15 (emphasis is mine):

Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

If you want to avoid padding, the only maximally portable way is to pack the data structure yourself into contiguous memory (e.g., a vector) when sending, and unpack the serialized data into your data structure when receiving. Your compiler may provide extensions to allow you to keep all members within your struct contiguous (e.g., GCC's packed attribute, or VC++'s pack pragma, as described here).

Community
  • 1
  • 1
jxh
  • 69,070
  • 8
  • 110
  • 193
1

There is such a thing in C++ called POD for plain-old-data. Basically if certain restrictions are met a struct in C++ is POD and it is going to be byte-to-byte compatible with the same structure defined in C code.

To be POD, a structure must have no access specifiers (public, private), and no non-static member functions, including operators, constructors and destructors.

Nick Zavaritsky
  • 1,429
  • 8
  • 19
  • *Standard layout* is relevant here (this is a superset of PODs) – M.M Jan 15 '15 at 00:02
  • Although it's byte-to-byte compatible with the same structure defined in C code, the C code version suffers from the same problem: arbitrary padding. – Martin C. Martin Jan 18 '15 at 16:55
  • @MartinC.Martin so the alignment of fixed width integer types is not spelled out in the standard? Weird. Now I am going to rewrite all my code that makes a lame assumption that the number of bits in a byte equals 8, must use CHAR_BIT instead. – Nick Zavaritsky Jan 19 '15 at 00:12
  • @NickZavaritsky: The authors of C89 didn't want to prevent C from being useful on platforms with really weird alignment requirements, and didn't think it necessary to say "Don't be stupid or obtuse" to make compiler writers refrain from inserting padding that serves no purpose on the intended target. Unfortunately, a lot of things that would have been properly recognized as obtuse 25 years ago are fashionable today. – supercat Aug 18 '16 at 20:04