-1

My question is about a typical process to parse a byte array like the following code:

struct Header { /* ... */ };
struct Entry1 { /* ... */ };
struct Entry2 { /* ... */ };

void Parse(char* p) {
  const auto header = reinterpret_cast<Header*>(p);
  p += sizeof(Header);
  const auto ent1 = reinterpret_cast<Entry1*>(p);
  p += sizeof(Entry1);
  const auto ent2 = reinterpret_cast<Entry2*>(p);
}

First of all, the spec says that char* can alias any other pointer type, so reinterpret_cast<Header*> is safe. However what about the other reinterpret_cast statemetns, are they violating the strict aliasing rule because p, whose type is char*, has already been aliased with Header*? or safe because p is incremented by sizeof(Header) ?

Thank you.

blackbrandt
  • 2,010
  • 1
  • 15
  • 32
wanwan
  • 93
  • 10
  • 1
    You can use `std::memcpy` to copy the bytes into your struct, assuming the struct is trivially copyable. If the struct is not trivially copyable, you'll need to parse the bytes field-by-field into the struct. – Eljay Jul 21 '21 at 13:17
  • 1
    What version of C++ are you using? The rules have changed in C++20. – NathanOliver Jul 21 '21 at 13:18
  • 2
    *the spec says that `char*` can alias any other pointer type, so `reinterpret_cast
    ` is safe.* You have that backward. The spec says `reinterpret_cast(pointer_to_some_type);` is safe. The other way around may or may not be safe.
    – NathanOliver Jul 21 '21 at 13:19
  • 1
    Seems like you are re inventing the wheal. Thing you doing called serialization. Take a look into [boot serialization](https://www.boost.org/doc/libs/1_76_0/libs/serialization/doc/) library which can help you with this. – Victor Gubin Jul 21 '21 at 13:30

1 Answers1

0

First of all, the spec says that char* can alias any other pointer type, so reinterpret_cast<Header*> is safe

Although the reinterpret cast itself never has UB, I wouldn't describe it "safe" because the behaviour of accessing through the reinterpreted pointer may be undefined. Although reinterpreting anything as char array is OK, doing it the other way and reinterpreting a char array as something else is NOT OK.

Assuming your classes are trivially copyable, you could use std::memcpy:

Header h0;
std::size_t offset = sizeof h0;
std::memcpy(&h0, p, offset);
p += offset;

Entry1 e1;
offset = sizeof e1;
std::memcpy(&e1, p, offset);
p += offset;

Entry2 e2;
offset = sizeof e2;
std::memcpy(&e2, p, offset);

However, keep in mind that different systems may have different alignment requirements and thus different amounts of padding in their classes, as well as different sized integers as well as different order of bytes within integers (and different sizes of bytes, but I suppose that's not typical issue with IPC). As such, this simple approach can only work within the same system and won't be useful for communicating between separate processes (like over the internet or filesystem). To achieve that correctly, you must explicitly place each byte according to the protocol.

eerorika
  • 232,697
  • 12
  • 197
  • 326