2

Question inspired by Dealing with large data binary files

Link to Object

Program (1) creates a memory-mapped file and writes some Objects (C++ Standard definition) to it, closes the file and exits.

Program (2) maps the above file into memory and tries to access the Objects via reinterpret_cast.

Is this legal by the Standard as the object representations have not changed and the Objects still exist in the file ?

If this was attempted between 2 processes, not with a file, but using shared process memory is this legal ?

Note - this question is not about storing or sharing local virtual addresses as this is obviously a bad thing.

yuri kilochek
  • 12,709
  • 2
  • 32
  • 59
Richard Critten
  • 2,138
  • 3
  • 13
  • 16
  • 1
    Object lifetimes are scoped within the C++ program being executed. The object as seen by program (1) is not the Object that would be seen by Program (2). But the object does not exist at all in Program (2) as its lifetime never started. –  Oct 18 '21 at 15:23
  • 1
    If `Object` isn't too big and is a standard layout object, you can `memcpy` the representation from the memory mapped file to an actual instance of your object. – François Andrieux Oct 18 '21 at 15:24
  • @FrançoisAndrieux Which limits the types you can use to types that are trivially copyable https://en.cppreference.com/w/cpp/types/is_trivially_copyable – Pepijn Kramer Oct 18 '21 at 15:27
  • @PepijnKramer I'm not sure trivially copyable is sufficient, since the data comes from another process that may be compiled using a different implementation. Standard layout types are compatible across programming languages, so that should be fine. Trivially copyable types might not have identical layouts across implementations. – François Andrieux Oct 18 '21 at 15:31
  • 1
    Yup serialization is tough like that (I assumed both processes to be compiled with that same settings though). @Richard Critten have a look at IPC (inter process communication) protocols they do the hard work of sending types from one process to another in standardized formats. (e.g. protobuf) – Pepijn Kramer Oct 18 '21 at 15:32
  • You could write code in C++ or in C to persist your data in files or databases. For example, see http://refpersys.org/ and read also Jacques Pitrat's book on *Artificial Beings the conscience of a conscious machine* – Basile Starynkevitch Nov 09 '21 at 18:41

1 Answers1

8

No, objects do not persist this way.

C++ objects are defined primarily by their lifetime, which is scoped to the program.

So if you want to recycle an object from raw storage, there has to be a brand new object in program (2) with its own lifetime. reinterpret_cast'ing memory does not create a new object, so that doesn't work.

Now, you might think that inplace-newing an object with a trivial constructor at that memory location could do the trick:

struct MyObj {
  int x;
  int y;
  float z;
};

void foo(char* raw_data) {
  // The content of raw_data must be treated as being ignored.
  MyObj* obj = new (raw_data) MyObj();
}

But you can't do that either. The compiler is allowed to (and demonstrably does sometimes) assume that such a construction mangles up the memory. See C++ placement new after memset for more details, as well as a demonstration.

If you want to initialize an object from a given storage representation, you must use memcpy() or an equivalent:

void foo(char* raw_data) {
  MyObj obj;

  static_assert(std::is_standard_layout_v<MyObj>);
  std::memcpy(&obj, raw_data, sizeof(MyObj));
}

Addendum: It is possible to do the equivalent of the desired reinterpret_cast<> by restomping the memory with its original content after creating the object (inspired by the IOC proposal).

#include <type_traits>
#include <cstring>
#include <memory>

template<typename T> 
T* start_lifetime_as(void *p) 
  requires std::is_trivially_copyable_v<T> {
  
  constexpr std::size_t size = sizeof(T);
  constexpr std::size_t align = alignof(T);

  auto aligned_p = std::assume_aligned<align>(p);

  std::aligned_storage_t<size, align> tmp;
  std::memcpy(&tmp, aligned_p, size);

  T* t_ptr = new (aligned_p) T{};
  std::memcpy(t_ptr , &tmp, size);

  return std::launder<T>(t_ptr);
}


void foo(char* raw_data) {
  MyObj* obj = start_lifetime_as<MyObj>(raw_data);
}

This should be well-defined in C++11 and up as long as that memory location only contains raw data and no prior object. Also, from cursory testing, it seems like compilers do a good job at optimizing that away.

see on godbolt