4

Consider a POSIX.1-2008 compliant operating system, and let fd be a valid file descriptor (to an open file, read mode, enough data...). The following code adheres to the C++11 standard* (ignore error checking):

void* map = mmap(NULL, sizeof(int)*10, PROT_READ, MAP_PRIVATE, fd, 0);
int* foo = static_cast<int*>(map);

Now, does the following instruction break strict aliasing rules?

int bar = *foo;

According to the standard:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

What's the dynamic type of the object pointed by map / foo ? Is that even an object? The standard says:

The lifetime of an object of type T begins when: storage with the proper alignment and size for type T is obtained, and if the object has non-trivial initialization, its initialization is complete.

Does this mean that the mapped memory contains 10 int objects (suppose that the initial address is aligned)? But if it is true, wouldn't this apply also to this code (which clearly breaks strict aliasing)?

char baz[sizeof(int)];
int* p=reinterpret_cast<int*>(&baz);
*p=5;

Even oddly, does that mean that declaring baz starts the lifetime of any (properly aligned) object of size 4?


Some context: I am mmap-ing a file which contains a chunk of data which I wish to directly access. Since this chunk is large I'd like to avoid memcpy-ing to a temporary object.


*can nullptr be instead of NULL here, is it implicitly casted to NULL? Any reference from the standard?

Steven
  • 209
  • 2
  • 10
  • You mean `int* foo = static_cast(map);` (not `fd`) – Basile Starynkevitch Jul 25 '17 at 18:21
  • @BasileStarynkevitch indeed, thanks! – Steven Jul 25 '17 at 18:27
  • 1
    It unfortunately does break strict aliasing, but on the other hand, the compiler has no way of knowing what is stored in that memory region, so it has to assume it could be an object of any type and hence you should not see any strange forms of UB. – MikeMB Jul 25 '17 at 18:51
  • 1
    @MikeMB Pass the `-fstrict-aliasing` flag, and it will result in undefined behavior, potentially even optimized into a NOP. Either a memcpy, or at least a helper function performing an unaligned read will be needed to ensure defined behavior. – Ext3h Jul 25 '17 at 19:10
  • @Ext3h: It will always be undefined behavior regardless of the flags you pass, but I can't think of any way that this could actually manifest in such crazy optimizations. The mmap returns an opaque pointer - the compiler knows nothing about the memory there - for all it knows, the correct objects have already been created in the memory. – MikeMB Jul 25 '17 at 20:28
  • @MikeMB The compiler could know that `mmap` doesn't create any C++ objects. I could easily go through the C API of a system and mark up each function that returns a memory block containing no C++ objects. This would enable many optimizations of C++ code using C code when `-fstrict-aliasing` was active! Entire branches of code could be eliminated as containing undefined behavior, and hence as-if can be treated as not reachable! Coming in your next version of gcc. I'll start with `malloc`. – Yakk - Adam Nevraumont Jul 25 '17 at 20:41
  • 1
    @Yakk: `The compiler could know that doesn't create any C++ objects`. How? Other than malloc, or memcpy, mmap is not part of the c++ standard library. And there is no reason to believe a c function couldn't return a pointer to memory containing c++ objects. – MikeMB Jul 25 '17 at 20:53
  • Btw. you don't have to specify `-fstrict-aliasing` explicitly. gcc will do it on O2 anyway. – MikeMB Jul 25 '17 at 21:00
  • @mike because I can modify gcc to support a `[[noobjectsreturned]]` attribute and mark up every C function returning a `void*` to raw memory, like `malloc` or `mmap`. Then teach gcc about it. Think of the optimization opportunities! My point is, relying on your compiler not catching your UB is a bad medium-term plan. – Yakk - Adam Nevraumont Jul 25 '17 at 21:11
  • 2
    @Yakk: Afaik, nothing in the posix spec says that mmap returns a pointer to raw memory and contrary to malloc it is guaranteed to be initialized and in this case the memory even contains actual data that might have previously be written via a different mapping. So by what right would you attach such an attribute? In the end, interactions between c and c++ functions are afaik implementation defined anyway (not to mention that mmap is a posix function), so the c++ standard alone will probably not give a final answer here. – MikeMB Jul 25 '17 at 21:59
  • "_does that mean that declaring baz starts the lifetime of any (properly aligned) object of size 4?_" Although that hurts the intellectual aesthetic feelings of some, doing that would be logically sound. – curiousguy May 31 '18 at 01:33

1 Answers1

5

I believe simply casting does violate strict aliasing. Arguing that convincingly is above my paygrade, so here is an attempt at a workaround:

template<class T>
T* launder_raw_pod_at( void* ptr ) {
  static_assert( std::is_pod<T>::value, "this only works with plain old data" );
  char buff[sizeof(T)];
  std::memcpy( buff, ptr, sizeof(T) );
  T* r = ::new(ptr) T;
  std::memcpy( ptr, buff, sizeof(T) );
  return r;
}

I believe the above code has zero observable side effects on memory and returns a pointer to a legal T* at location ptr.

Check if your compiler optimizes the above code to a noop. To do so, it has to understand memcpy at a really fundamental level, and constructing a T has to do nothing to the memory there.

At least clang 4.0.0 can optimize this operation away.

What we do is we first copy the bytes away. Then we use placement new to create a T there. Finally, we copy the bytes back.

We have a legally created T with exactly the bytes we want in it.

But the copy away and back are to a local buffer, so it has no observable effect.

The construction of the object, if a pod, doesn't have to touch bytes either; technically the bytes are undefined. But compilers who are smart say "do nothing".

So the compiler can work out that all this manipulation can be skipped at runtime. At the same time, we have in the abstract machine properly created an object with the proper bytes at that location. (assuming it has valid alignment! But that isn't this code's problem.)

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • 2
    FYI : All gcc versions since 4.6 also optimize this away. – Daniel Kamil Kozar Jul 25 '17 at 18:28
  • 2
    Neat, but in my case I am not allowed to write the memory pointed by ptr. It is probably fine if the code get's optimized away, but it would cause troubles if it doesn't. – Steven Jul 25 '17 at 18:28
  • I think it is enough of the object is trivially coddle and trivially destructible – MikeMB Jul 25 '17 at 18:39
  • @MikeMB trivial ctorable, trivially copyable, and trivially destructible? – Yakk - Adam Nevraumont Jul 25 '17 at 19:06
  • 1
    @Yakk: Urgs, I should check what the auto complete does before submitting ;). My point: It doesn't have to be trivially constructable (just default constructable) and when thinking about it: trivially detorable is probably also not necessary. Of course at some point, the compiler will no longer be able to optimize it into a no-op. – MikeMB Jul 25 '17 at 20:48
  • 1
    @Yakk I don't think simply casting is in violation of strict aliasing. To violate strict aliasing it is required to "attempt to access the stored value of an object", which the cast is not doing. As soon as the pointer is dereferenced (and read) though, that's a violation. Probably dereferencing and assigning would still be fine (see [here](https://stackoverflow.com/questions/18659427/c-c-strict-aliasing-object-lifetime-and-modern-compilers)) Anyway, do you have any workaround for my use case? I am not allowed to write to the memory pointer by ptr. – Steven Jul 26 '17 at 09:36