20

What are the cases where reinterpret_casting a char* (or char[N]) is undefined behavior, and when is it defined behavior? What is the rule of thumb I should be using to answer this question?


As we learned from this question, the following is undefined behavior:

alignas(int) char data[sizeof(int)];
int *myInt = new (data) int;           // OK
*myInt = 34;                           // OK
int i = *reinterpret_cast<int*>(data); // <== UB! have to use std::launder

But at what point can we do a reinterpret_cast on a char array and have it NOT be undefined behavior? Here are a few simple examples:

  1. No new, just reinterpret_cast:

    alignas(int) char data[sizeof(int)];
    *reinterpret_cast<int*>(data) = 42;    // is the first cast write UB?
    int i = *reinterpret_cast<int*>(data); // how about a read?
    *reinterpret_cast<int*>(data) = 4;     // how about the second write?
    int j = *reinterpret_cast<int*>(data); // or the second read?
    

    When does the lifetime for the int start? Is it with the declaration of data? If so, when does the lifetime of data end?

  2. What if data were a pointer?

    char* data_ptr = new char[sizeof(int)];
    *reinterpret_cast<int*>(data_ptr) = 4;     // is this UB?
    int i = *reinterpret_cast<int*>(data_ptr); // how about the read?
    
  3. What if I'm just receiving structs on the wire and want to conditionally cast them based on what the first byte is?

    // bunch of handle functions that do stuff with the members of these types
    void handle(MsgType1 const& );
    void handle(MsgTypeF const& );
    
    char buffer[100]; 
    ::recv(some_socket, buffer, 100)
    
    switch (buffer[0]) {
    case '1':
        handle(*reinterpret_cast<MsgType1*>(buffer)); // is this UB?
        break;
    case 'F':
        handle(*reinterpret_cast<MsgTypeF*>(buffer));
        break;
    // ...
    }
    

Are any of these cases UB? Are all of them? Does the answer to this question change between C++11 to C++1z?

Community
  • 1
  • 1
Barry
  • 286,269
  • 29
  • 621
  • 977
  • **(1)** looks valid to me. In both statements, a new `int` object is conjured up and assigned a value. *Reading* that value is where things start getting hairy. Same with **(2)** (assuming `sizeof(int)==4`). **(3)** looks like UB to me. – Igor Tandetnik Sep 10 '16 at 19:19
  • @IgorTandetnik Fleshed out the questions with some reading too, and got rid of the assumption about `sizeof(int)`, thanks. – Barry Sep 10 '16 at 19:27
  • 1
    Now **(1)** and **(2)** seem to exhibit UB, on the same grounds as the linked question. It would be easy to salvage by saving the pointer from the first cast, and using it for all subsequent writes and reads. – Igor Tandetnik Sep 10 '16 at 20:08
  • It seems that most compilers behave how you expect them to, even if it not exactly defined. Look here for some more information: http://stackoverflow.com/questions/39381726/is-it-safe-to-cast-to-a-class-that-has-the-same-data-member-layout-but-a-differ – user2296177 Sep 10 '16 at 21:30
  • 3
    @user2296177 : Irrelevant for a question tagged `language-lawyer`. ;-] – ildjarn Sep 10 '16 at 23:42
  • With P0137, [\[intro.object\]/1](https://timsong-cpp.github.io/cppwp/intro.object#1) makes it crystal clear when objects are created. There is no living `int` object at `data` or `data_ptr` in either of the first two examples. – T.C. Sep 12 '16 at 04:03

1 Answers1

13

There are two rules at play here:

  1. [basic.lval]/8, aka, the strict aliasing rule: simply put, you can't access an object through a pointer/reference to the wrong type.

  2. [base.life]/8: simply put, if you reuse storage for an object of a different type, you can't use pointers to the old object(s) without laundering them first.

These rules are an important part of making a distinction between "a memory location" or "a region of storage" and "an object".

All of your code examples fall prey to the same problem: they're not the object you cast them to:

alignas(int) char data[sizeof(int)];

That creates an object of type char[sizeof(int)]. That object is not an int. Therefore, you may not access it as if it were. It doesn't matter if it is a read or a write; you still provoke UB.

Similarly:

char* data_ptr = new char[sizeof(int)];

That also creates an object of type char[sizeof(int)].

char buffer[100];

This creates an object of type char[100]. That object is neither a MsgType1 nor a MsgTypeF. So you cannot access it as if it were either.

Note that the UB here is when you access the buffer as one of the Msg* types, not when you check the first byte. If all your Msg* types are trivially copyable, it's perfectly acceptable to read the first byte, then copy the buffer into an object of the appropriate type.

switch (buffer[0]) {
case '1':
    {
        MsgType1 msg;
        memcpy(&msg, buffer, sizeof(MsgType1));
        handle(msg);
    }
    break;
case 'F':
    {
        MsgTypeF msg;
        memcpy(&msg, buffer, sizeof(MsgTypeF));
        handle(msg);
    }
    break;
// ...
}

Note that we're talking about what the language states will be undefined behavior. Odds are good that the compiler would be just fine with any of these.

Does the answer to this question change between C++11 to C++1z?

There have been some significant rule clarifications since C++11 (particularly [basic.life]). But the intent behind the rules hasn't changed.

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Doesn't declaring my `char` array not potentially constitute obtaining storage for some yet-to-be vacuously-initialized type `T`? In that sense, wouldn't a healthy sprinkling of `launder` make everything well defined? – Barry Sep 11 '16 at 03:08
  • 1
    @Barry: [That's not what `std::launder` is for](http://stackoverflow.com/a/39382728/734069). If you start the lifetime of an object in the storage of an older one, it lets you get a pointer to the new object from a pointer to the old one. It doesn't start the lifetime of anything. "*Doesn't declaring my char array not potentially constitute obtaining storage for some yet-to-be vacuously-initialized type T?*" By that logic, *any object* could be a "yet-to-be vacuously-initialized type T". After all, an object has storage. `char[X]` is as much an object as any other object. – Nicol Bolas Sep 11 '16 at 03:54
  • But that's when [basic.life] says the lifetime of an object begins - when storage is acquired. Given `char buf[4]; int* i = new (buf) int;`, when does the lifetime of the `int` pointed to by `i` begin? – Barry Sep 11 '16 at 15:37
  • 2
    @Barry: Placement new begins the lifetime of an object, even if there was already an object in that storage. The first statement puts a `char[4]` in that storage. The second statement ends the lifetime of the `char[4]` and begins the lifetime of the `int`. – Nicol Bolas Sep 11 '16 at 16:06
  • Does placement new "obtain storage"? The storage is already there. – Barry Sep 11 '16 at 16:15
  • 1
    @Barry: "*Does placement new "obtain storage"?*" Yes. Placement new syntax simply supplies additional arguments to the allocation functions. In this case, the `void*` argument provokes a call to an `operator new` overload that just returns the passed parameter. But it is *still* an allocation function, and it is still obtaining storage. Nothing was ever said about *new* storage. – Nicol Bolas Sep 11 '16 at 16:27
  • 1
    Did this change in C++20? (at least for some types?) I seem to remember at least a proposal to define a lot of previously undefined behavior around pointers to trivial types/uninitialized memory. – aij Mar 15 '22 at 19:57
  • 1
    @aij: Yes, but the question was specifically tagged as C++17, so it's irrelevant to the question. You can find out more by looking for "implicit lifetime" or "implicit object construction" or something to that effect. – Nicol Bolas Mar 15 '22 at 20:18