7

There have been multiple questions asking about mmap and accessing structures in the shared memory while not invoking UB by breaking the strict aliasing rule or violating object lifetimes.

With consensus that this is not generally possible without copying the data.

More generally, over the years here, I have seen countless code snippets involving reinterpret_cast (or worse) for (de)serialization, breaking those rules. I always recommended std::memcpy while claiming that the compiler will elide those copies. Can we do better now?

I would like to clarify what is the correct approach to simply interpret a bunch of bytes as another POD type w̲i̲t̲h̲o̲u̲t̲ copying the data in C++20?

There was a proposal P0593R6 which to my knowledge got accepted into C++20.

Based on reading that, I believe the following code is safe:

template <class T>
T* pune(void* ptr) {
    // Guaranteed O(1) initialization without overwritting the data.
    auto* dest = new (ptr) std::byte[sizeof(T)];
    // There is an implicitly created T, so cast is valid.
    return reinterpret_cast<T*>(dest);
}
#include <cstring>
#include <array>
#include <fmt/core.h>

struct Foo {
    int x;
    float y;
};

auto get_buffer() {
    Foo foo{.x = 10, .y = 5.0};
    std::array<std::byte, sizeof(Foo)> buff;
    std::memcpy(buff.data(), &foo, sizeof(foo));
    return buff;
}

int main() {
    // Imagine the buffer came from a file or mmaped memory,
    // compiler does not see the memcpy above.
    auto buff = get_buffer();

    // There is alive Foo as long as buff lives and 
    // no new objects are created in there.
    auto* new_foo = pune<Foo>(buff.data());

    fmt::print("Foo::x={}\n", new_foo->x);
    fmt::print("Foo::y={}\n", new_foo->y);
}

Live demo godbolt.

Is this really safe?

Is pune really O(1)?

EDIT

Okay, how about using memmove and still rely on the compiler to do this in O(1)?

template <class T>
T* pune(void* ptr) {
    void* dest = new (ptr) std::byte[sizeof(T)];
    auto* p = reinterpret_cast<T*>(std::memmove(dest,ptr,sizeof(T)));
    return std::launder(p);
}
Quimby
  • 17,735
  • 4
  • 35
  • 55
  • for type safe punning from existing complete data type (no `void*`) there is [std::bit_cast](https://en.cppreference.com/w/cpp/numeric/bit_cast) – bolov Jun 05 '22 at 23:32
  • Since you didn't tag this as `language-lawyer`, do you want to know what the guarantees are practically speaking on implementations, or do you want to know what the standard says? From the latter point-of-view, the implicitly created `Foo` object will have an indeterminate value, not the value corresponding to the original underlying bytes. And if you look at the previous revision of the mentioned paper, even there the proposed wording for `std::start_lifetime_as` was in terms of `memcpy` and so without any `O(1)` guarantee (as far as I can tell). – user17732522 Jun 05 '22 at 23:55
  • 1
    There is also a minor issue with `pune` in that it is missing a `std::launder` call after the `reinterpret_cast`. I think that was overlooked in the paper where a similar example was given. – user17732522 Jun 05 '22 at 23:56
  • @user17732522: Well, the language surrounding implicit object creation says that the pointer returned by `new` should point to a "suitable created object". – Nicol Bolas Jun 06 '22 at 00:31
  • @NicolBolas That's for `operator new`, but I think it is very clear about the pointer value of the result of a `new` expression: https://eel.is/c++draft/expr.new#10.sentence-2 – user17732522 Jun 06 '22 at 00:56
  • @user17732522: But "the object created" can be the object implicitly created, yes? – Nicol Bolas Jun 06 '22 at 01:02
  • @NicolBolas I won't claim to know the intended meaning for sure, but "the object" to me seems to clearly refer to the object which is explicitly created by the `new` expression. Otherwise I would expect "an object". But I should have linked to the sentence before that anyway, which is what applies here and is clear that the first element of the created array is referred to. – user17732522 Jun 06 '22 at 01:05
  • @user17732522 Practical approach would suffice. That sounds unfortunate. I was toying with `T* dest = ptr; std::memmove(dest,ptr,sizeof(T))`, would that work? Maybe extra `std::lanuder(dest);` for good measure? I know it doesn't have `O(1)` guarantee but that should be quite easy for the compiler to optimize. – Quimby Jun 06 '22 at 08:01
  • @Quimby C says `memmove` is equivalent to a `memcpy` to a temporary buffer and then to the destination. C++ specifies that it implicitly creates objects ["_immediately prior to copying the sequence of characters to the destination._"](https://www.eel.is/c++draft/strings#cstring.syn-3), which I guess means inbetween these two steps. So it looks fine to me. There is technically no difference to the `memcpy` variant. `std::launder` would still be required. Such a `memmove` was also mentioned in an earlier revision of the proposal for `std::start_lifetime_as`/`std::bless` as implementation. – user17732522 Jun 06 '22 at 13:41
  • (But there might be a reason I am missing for why the proposal then went on to define it explicitly via `memcpy` instead of such a `memmove`.) – user17732522 Jun 06 '22 at 13:45
  • In the edit in your question you need to remove `void* dest = new (ptr) std::byte[sizeof(T)];` though. That also resets all values to indeterminate. All object creation would be handled implicitly by `memmove`. – user17732522 Jun 06 '22 at 13:48
  • [Here](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95349#c20) you have a comment from Richard Smith mentioning that such a `memmove` implementation is valid for a specific example. But as you can see in the discussion of the bug report there, at least for GCC even some `memcpy` variants that should be correct don't work and if you take the test case from there and modify it to use the `memmove` variant for `s1`, it still has this bug: https://godbolt.org/z/GjrYE8nPY – user17732522 Jun 06 '22 at 14:17

1 Answers1

8

Guaranteed O(1) initialization without overwritting the data.

Not so much.

Indeed, P0593 explicitly mentions that this will not work:

Symmetrically, when the float object is created, the object has an indeterminate value, and therefore any attempt to load its value results in undefined behavior.

[basic.indent] contains the details:

When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced

Placement new "obtains storage" for the object. Since no initialization is performed, it has an "indeterminate value". Attempting to read that will yield undefined behavior.

As of yet, there is no mechanism in C++ that allows you to take memory which had one object in it and read its values as another object.

However, mmap could be considered (by the implementation) to implicitly create objects in the storage it returns. As such, you could just cast such a pointer to an implicit lifetime type (not POD, which no longer exists as a category) and read the values from there.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Thank you for the answer! That's unfortunate but I think I understand the error. Do you know about g++,clang++ guarantees on `mmap`? I did not find any documentation regarding this. I edited my question with `std::memmove`, would that work please? – Quimby Jun 06 '22 at 08:10
  • Also what would you be your personal opinion on danger of using this in real world? – Quimby Jun 06 '22 at 08:11