2

There are many uses of reinterpret_cast that will compile, but are UB. It's only well defined in a handful of cases. The ones that currently matter to me, are casting to:

  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union)
  • a char or unsigned char type

Say I have a buffer which contains a binary structure that I want to parse. I use a struct which only contains char a[100] and uses methods to, for example, extract the uint32_t which resides at a[3]. Is casting the buffer to the struct and then accessing the struct's array this way well defined?

If it is, I expect that's because of the two rules above; however, I'm also half-expecting that these rules don't apply to a char[], or that I may have alignment issues of some sort because it's an array and not just a char*.

A small code sample to make the use case more clear:

struct Overlay {
    char a[100];

    uint32_t parseSomeField() const {
        return my_char_to_uint32_t_function(&a[3]);
    }
};

int main() {
    std::vector<char> buffer(1024);
    // fill buffer
    auto overlay = reinterpret_cast<const Overlay*>(&buffer[0]);
    std::cout << overlay->parseSomeField() << std::endl;
}

I assume replacing char a[100] with simply char *a would be OK for sure, but by giving Overlay the size of the structure I want to parse, it allows me to do the following as well:

Overlay overlay;
// fill overlay by writing into it directly
std::cout << overlay.parseSomeField() << std::endl;

which saves some lines of code.


Edit:

Thanks to the answer and comments, it's become clear to me that this use of reinterpret_cast is UB. The following supports both working with an existing buffer and copying into the struct directly. You can do sizeof as well, which is nice. Also, this should be well defined:

struct VersatileOverlay {
    char a[100];

    static uint32_t parseSomeField(const char *a) {
        return some_char_to_uint32_t_function(a + 3);
    }

    uint32_t parseSomeField() const {
        return parseSomeField(&a[0]);
    }
};

int main() {
    std::vector<char> buffer(1024);
    // fill buffer
    std::cout << VersatileOverlay::parseSomeField(&buffer[0]) << std::endl;

    VersatileOverlay vo;
    memcpy(&vo, /*source ptr*/, sizeof(VersatileOverlay));
    std::cout << vo.parseSomeField() << std::endl;
}

parseSomeField() and its siblings will simply call their static counterparts, passing them the internal buffer.

delins
  • 125
  • 9
  • 2
    If `Overlay` has at least one `virtual` method, it definitely will not work, as each object will store also a pointer to virtual methods table. – Renat Jul 17 '19 at 00:06
  • How is defined `my_char_to_uint32_t_function`? If it does a `reinterpret_cast` ou use an `union` then it might not work because of alignment issues. – Phil1970 Jul 17 '19 at 01:00
  • I think that the **related** question might be useful for you: https://stackoverflow.com/questions/50383187/is-it-a-strict-aliasing-violation-to-alias-a-struct-as-its-first-member?rq=1. – Phil1970 Jul 17 '19 at 01:06
  • @Phil1970: Using bit shifts to move the bytes into the right positions. So that shouldn't cause any problems. – delins Jul 17 '19 at 13:16
  • If `Overlay` is intended to wrap a pre-existing buffer, it could alternatively contain a reference-to-array. This would make it significantly less general-purpose, though. – Justin Time - Reinstate Monica Jul 17 '19 at 18:34
  • 1
    I don't see the purpose of writing such code. **If you want a design with a class** `Overlay` then why not keep the buffer inside the class and provide a way to fill the buffer? **Otherwise, why not just define utility fonctions** that do the conversion from a pointer to the start of the data to convert. **A third option might be a wrapper class** that has a pointer to the raw buffer and function to decode the data – Phil1970 Jul 19 '19 at 01:41
  • Casting to `Overlay*` would save you the trouble of passing a pointer to the data with every call for a field as opposed to using utility functions. Also, it's not always known beforehand where the binary structure resides in the input buffer. You may need to try every byte offset and if `Overlay` instances own the data, this results in many small `memcpy`'s. The edit in my question supports your first two proposals. The third one I hadn't thought of. You could even make that capable of both owning and non-owning the data it uses. – delins Jul 20 '19 at 11:22

1 Answers1

2

Is casting to a struct that consists solely of a char[] and reading from that array well defined?

It is not well-defined as per following rule:

[basic.lval]

If a program attempts to access the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
  • a char, unsigned char, or std::byte type.

None of those listed types are Overlay, when the underlying object is a dynamic array of char.


What appears like a sensible solution here, is simply a free (or static member) function:

std::uint32_t
parseSomeField(const char* a) const {
    return my_char_to_uint32_t_function(a + 3);
}

You can use it to parse a vector:

parseSomeField(buffer->data());

or, if you do have a class similar to Overlay:

parseSomeField(overlay.a);
Community
  • 1
  • 1
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • I think that according to this related question (https://stackoverflow.com/questions/50383187/is-it-a-strict-aliasing-violation-to-alias-a-struct-as-its-first-member?rq=1), the above code would be well defined (assuming `my_char_to_uint32_t_function` is properly implemented) – Phil1970 Jul 17 '19 at 01:09
  • @Phil1970 How is that related? `&buffer[0]` points to neither `Overlay`, nor to the first member of `Overlay`. – eerorika Jul 17 '19 at 01:18
  • So the fact that `char[]` degrades to `char*` when passed as a function argument doesn't play a role here? Also, if `char a[100]` was replaced with `const char a` and `&a[3]` is replaced with `&a+3`, would that make it valid? `a` would have the same address as `Overlay`, which has the same address as `buffer[0]`. It remains, however, a `char`, pointing to the start of the dynamic array of `char` that `buffer` is. – delins Jul 17 '19 at 13:49
  • `So the fact that char[] degrades to char* when passed as a function argument doesn't play a role here?` It does not. `if char a[100] was replaced with const char a and &a[3] is replaced with &a+3, would that make it valid?` It would not make it valid. `a would have the same address as Overlay` What `Overlay`? No `Overlay` object exists in the example. – eerorika Jul 17 '19 at 13:53
  • Overlay -> overlay, which is a pointer to.. nvm. Looking at all the No's that doesn't matter anymore :) Thanks for the answer. Could you tell me what would be well defined? It seems to me now that as soon as you type `&buffer[0]` you're attacked by dragons. – delins Jul 17 '19 at 13:59
  • @delins You can use `&buffer[0]` as an array of characters (it is a pointer to a character in an array of characters). You can use an instance of `Overlay` as an instance of `Overlay` (that's what it is), or you can treat it as an array of characters (because everything can be treated as array of characters - at least for reading). – eerorika Jul 17 '19 at 14:03
  • So in order to use the approach with `Overlay` as shown in the example in the op, I should always create an `Overlay` object and `memcpy` the 100 bytes into it, possibly from `buffer`? There's no cheaper way (except for writing functions that access the buffer directly)? – delins Jul 17 '19 at 14:09
  • @delins `memcpy`ing as well as writing to `overlay.a` directly would both be OK. But yes, the `Overlay` object must be created in order to call the member function. – eerorika Jul 17 '19 at 14:11
  • Thanks for the update. I've edited op and combined everything into a new code sample. – delins Jul 17 '19 at 15:30
  • @delins: There are a number of places where the Standard simultaneously describes the behavior of an action and categorizes it as Undefined; the authors of the Standard expected that implementations would give priority to the former when useful and practical, but regarded the question of when to do so as a Quality of Implementation issue outside the Standard's jurisdiction. Unfortunately, the language has been taken over by people who think UB needs to be given absolute priority in the name of "optimization" without regard for usefulness or practicality. – supercat Jul 26 '19 at 16:59