2

I have been researching a bit about efficient methods of serialization and deserialization of objects in C++. I am writing for an embedded system so I am wondering if the following is valid?

static const uint8_t data[] = {0xAA, 0xBB, 0x10, 0b00000101, 0x0A, 0x00, 0xDE, 0xAD, 0xBE, 0xEF, 0x04, 0x00, 0x00, 0x11, 0x22, 0x33};

struct __attribute__((__packed__)) Message {
    uint8_t std[2];
    uint8_t id;
    union {
        struct {
            bool optionsReply : 1;
            bool optionsFoo : 1;
            bool optionsBar : 1;
        };

        uint8_t optionsU8;
    };

    uint16_t payloadLength;

    uint8_t payload[0];
};

struct __attribute__((__packed__)) TestMessage : public Message {
    uint8_t a0;
    uint8_t a1;
    uint8_t a2;
    uint8_t a3;

    uint16_t length;
    uint8_t data[0];
};

const Message* msg = reinterpret_cast<const Message*>(data);
const TestMessage* testMsg = static_cast<const TestMessage*>(msg);

I tested this with my compiler, and it looks like everything works as I would expect. My question is, because of zero length arrays that I am forcing to be larger at runtime via the associated length, is this the preferred/safest method of doing this?

Thanks

Patrick Wright
  • 1,401
  • 7
  • 13
  • 7
    Most likely this will work, but it's undefined behavior as far as the standard is concerned. The legal way to do this is to `memcpy` the buffer into an object of the appropriate type. – NathanOliver Mar 17 '21 at 12:34
  • @NathanOliver This is essentially what I am doing at the moment. The problem being, since this is an embedded system, memory and CPU cycles are more precious so I'd like to reuse the same buffer (since that buffer is supposed to be a serialized representation of a struct anyways). – Patrick Wright Mar 17 '21 at 12:36
  • Compilers are smart enough to notice those extra `memcpy`. If you use it instead of `reinterpret_cast` it will most likely generate the same assembly, but it is safe because the compiler knows what you are doing with the pointers. – Quimby Mar 17 '21 at 12:39
  • it might be that the compiler turns `memcpy` into something quite similar like your cast, only difference that it is legal. Did you look at the generated assembly? – 463035818_is_not_an_ai Mar 17 '21 at 12:39
  • 1
    If you look at the possible implementation of [`std::bit_cast`](https://en.cppreference.com/w/cpp/numeric/bit_cast) you can see that even this function creates an object and uses `memcpy`. The problem with `const Message* msg = reinterpret_cast(data)` is you never actually create an `Message` for `msg` to point to. The language expects a `Message*` to point to a `Message`, or a derived type, no other types are allowed. – NathanOliver Mar 17 '21 at 12:42
  • 3
    You may be more concerned with whether your embedded compiler _chooses to make this behavior defined_, than with what the C++ standard allows. – Drew Dormann Mar 17 '21 at 12:42
  • @NathanOliver Nonsense, `memcpy` is completely legal to reimplement in user code, by casting operands into pointers to character arrays and copying characters in a loop. Character types are exempt from strict aliasing rules specifically to allow this. The problem with the asker’s code is that `uint8_t` need not be a character type (even though no compiler dare define it otherwise). – user3840170 Mar 17 '21 at 12:43
  • 1
    `uint8_t data[0];` is probably going to be used like a flexible array in `C`. It's however not portable since it's not allowed in standard C++ with arrays of size zero and out of bounds accesses will make it UB. – Ted Lyngmo Mar 17 '21 at 12:43
  • 2
    @user3840170 You are allowed to go from `T*` to `char*`, and be able to go back to `T*` if you want. What you can't do is just go from a `char*` to a `T*`. C++ requires that the `char*` already point to a `T` when you go from `char*` to `T*`. – NathanOliver Mar 17 '21 at 12:45
  • Isn't this what `std::launder` is for? [example](https://stackoverflow.com/questions/66176720/why-introduce-stdlaunder-rather-than-have-the-compiler-take-care-of-it) –  Mar 17 '21 at 12:50
  • 1
    @dratenik No, `std::launder` exists to satisfy a kind of esoteric problem that occurs due to lifetime tracking which effects the optimizer; it doesn't allow for type-punning types that were never actually created. Even when laundering, you are just telling the C++ abstract machine that the memory it points to is valid, but may contain a different value than what it previously knew about (to prevent optimizing the wrong values). There is formally more to it, but that's the short version. In OP's example, a `Message` has never been created, so `std::launder` here would be UB. – Human-Compiler Mar 17 '21 at 13:17

0 Answers0