1

I have the following structs

#pragma pack(push)
struct COMMON {
    uint16_t type;
    uint16_t m;
    uint16_t o;
    uint16_t f;
    uint64_t s;
};
#pragma pack(pop)

#pragma pack(push)
struct A_HEADER : public COMMON {
    uint16_t g;
    uint16_t c;
};
#pragma pack(pop)

#pragma pack(push)
struct B_HEADER : public COMMON {
    uint32_t r;
    uint32_t h;
};
#pragma pack(pop)

now I know that the A_HEADER and B_HEADER are trivially copyable but are not standard layout. As far as I know, the former property means that they can be memcpyd safely, while the latter means that there is no guaranteed byte representation for those tho structs.

Now I need to send remotely (via UDP) both A_HEADER and B_HEADER. When the receiver got the message in some buffer, it does not know whether it is A_HEADER or B_HEADER and have to decide how to decode it depending on the field type contained on the COMMON.

In the receiver machine I was thinking something like

void print_header(const uint8_t* buf)
{
   const COMMON* com = reinterpret_cast<const COMMON*>(buf)

   switch (com->type) {
   case A_TYPE:
        print_acq_details(reinterpret_cast<const A_HEADER*>(com));
        break;
   case B_TYPE:
        print_cal_details(reinterpret_cast<const B_HEADER*>(com));
        break;
   }
}

but I have a sort of dissonance in my mind: according to the fact that both structs are trivially copyable and that a base pointer can point to some base-derived class it seems to me that what I wrote is safe; on the other side the fact that the derived struct above are not standard layout makes me worry. Perhaps, since the layout is compiler dependent, I was thinking that everything works as far as I have exactly same compiler on both machines (which have the same architecture).

What is the correct reasoning (if any)?

NOTICE

I know one solution would be to incapsulate COMMON inside A_HEADER and B_HEADER, in such a way to obtain POD, but I cannot modify such part of code.

MaPo
  • 613
  • 4
  • 9
  • You are breaking the strict aliasing rule by `reinterpret_cast`, you cannot just cast raw bytes to a type, even a simple one. Just `memcpy` them into new objects. – Quimby Aug 03 '22 at 10:00
  • @Quimby AFAIK this is no longer true since acceptance of [P0593R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html). – n. m. could be an AI Aug 03 '22 at 10:11
  • @n.1.8e9-where's-my-sharem. Well I asked about this [in my question](https://stackoverflow.com/questions/72511754/safely-type-punning-pod-like-structures-in-place-in-c20) and I am not convinced that is really the case. Coupled with the following sentence from the paper "Symmetrically, when the float object is created, the object has an indeterminate value ([dcl.init]p12), and therefore any attempt to load its value results in undefined behavior." – Quimby Aug 03 '22 at 10:26
  • No byte representation of anything, including standard-layout types, is guaranteed by the standard. There is no guarantee that there is no padding between two uint16_t fields for example. `#pragma pack` is not guaranteed to have any effect either. – n. m. could be an AI Aug 03 '22 at 10:27
  • @Quimby, thank you for the point, but I cannot see where the undefined behaviour may occur, can you please point it out? – MaPo Aug 03 '22 at 10:27
  • @Quimby The intent of P0593R6 is to make reinterpreting buffers obtained by `read` etc. legal. There is no wiggle room here. If you think it fails to achieve that, submit a defect report. – n. m. could be an AI Aug 03 '22 at 10:29
  • @n.1.8e9-where's-my-sharem. I understand, but since, by inspection, it is not impossible to send data via socket, which requirement should I look at in doing this? – MaPo Aug 03 '22 at 10:30
  • @MaPo `buf` doesn't point to an `COMM` object, thus casting to `com` and dereferencing `com->type` breaks the strict aliasing rule, at least prior to P0593R6. – Quimby Aug 03 '22 at 10:31
  • @n.1.8e9-where's-my-sharem. That is my understanding too, just saying that I asked about it here and did not get very clear answer. I do not think I have the expertise to submit a defect report. – Quimby Aug 03 '22 at 10:32
  • @Quimby, I beg your pardon for my ignorance, but I thought that the undefined behaviours coming from violation of strict aliasing rules would ensue when both the original pointer and the cast pointer were dereferenced and acted upon by some function. Here I do not see how this could happen. – MaPo Aug 03 '22 at 10:43
  • @Quimby I don't think a defect report is justified. If we postulate that we can `memcpy` a buffer obtained by `read` to a proper object, and the value will not be indeterminate, how come it is indeterminate when reinterpreting the buffer itself? Does `memcpy` know to breathe life into bytes, in a way no other function can? I don't think so. The value obtained by reinterpreting and then accessing the original will be exactly the same as the value obtained by `memcpy`'ing and then accessing the copy. That's what `memcpy` does. The only thing that was missing is the legality of reinterpreting. – n. m. could be an AI Aug 03 '22 at 10:47
  • @MaPo Technically speaking, any writes to `buff`, even the original ones do not have to be propagated to `com`, compiler is free to assume that. Of course in practice it tends to work but there is no guarantees, depends on the optimization and the capabilities of the compiler to do its alias analysis. – Quimby Aug 03 '22 at 10:56
  • @MaPo The possibility of sending data via sockets (or pipes or regular files) to another program does not necessarily derive from from the written C++ standard. Does it matter what it derives from? Call it common sense. You can send data between common platforms if you are careful enough (i.e. you should avoid unsized integral types, `long long double`, bit fields, ...) – n. m. could be an AI Aug 03 '22 at 10:59
  • @n.1.8e9-where's-my-sharem. `memcpy` doesn't have to if the destination is already `COMM com;`, it just overwrites the current representation which has been okay for long time for POD-like types. "how come it is indeterminate when reinterpreting the buffer itself?" because the Standard says so? I am not arguing about the reasonable implementation but about the rules. As I said, I asked about this and I am not sure about the Standard-conformant answer/solution, I just reinterpret them too and it works for me... – Quimby Aug 03 '22 at 10:59
  • @Quimby The standard doesn't say anything like that anywhere I can find. *When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value*. OK that's obvious. Where does it say that the object has indeterminate value *after* you `memcpy` or `read` or whatever *into* it? *until that value is replaced* If `memcpy` doesn't qualify, then you can't use `memcpy` *at all*. – n. m. could be an AI Aug 03 '22 at 11:03
  • @Quimby, do you mean that in the dereferencing `com` there is no guarantee that I'm accessing the same memory pointed by `buff`? So I do not understand how it is possible to deserialize chunk of data – MaPo Aug 03 '22 at 11:15
  • @n.1.8e9-where's-my-sharem. Are there some *careful enough* requirements asserts I can do to? – MaPo Aug 03 '22 at 11:19
  • @MaPo I don't know of any. – n. m. could be an AI Aug 03 '22 at 12:04

0 Answers0