9

I recently saw a class like this that was used to construct objects "on-demand" without having to use dynamic memory allocation for various reasons.

#include <cassert>

template<typename T>
class StaticObject
{
public:
    StaticObject() : constructed_(false)
    {
    }

    ~StaticObject()
    {
        if (constructed_)
            ((T*)object_)->~T();
    }

    void construct()
    {
        assert(!constructed_);

        new ((T*)object_) T;
        constructed_ = true;
    }

    T& operator*()
    {
        assert(constructed_);

        return *((T*)object_);
    }

    const T& operator*() const
    {
        assert(constructed_);

        return *((T*)object_);
    }

private:
    bool constructed_;
    alignas(alignof(T)) char object_[sizeof(T)];
};

Is this code, namely the casting of a properly aligned char array to an object pointer, considered undefined behavior by the C++14 standard or is it completely fine?

0x400921FB54442D18
  • 725
  • 1
  • 5
  • 18
  • It is fine if you do it carefully. Besides, there are `std::optional` in C++17 and `boost::optional` doing similar thing. But I would leave it to people who know better to examine this specific case. –  Jul 08 '18 at 12:05
  • 1
    Use placement new. This will avoid any UB. – Richard Critten Jul 08 '18 at 12:05
  • @RichardCritten Placement new on what, dynamically allocated memory? That's what the code here was trying to avoid. Besides, it already uses placement new, except on an aligned char array and I'm asking if this is considered UB by the language standard. – 0x400921FB54442D18 Jul 08 '18 at 12:08
  • placement new using `object_` as the storage. – Richard Critten Jul 08 '18 at 12:13
  • 5
    @RichardCritten isn't `new ((T*)object_) T;` exactly a placement new ? – Christophe Jul 08 '18 at 12:17
  • 1
    Side note: cast is not necessary as placement new accepts a void pointer anyway... – Aconcagua Jul 08 '18 at 12:22
  • 1
    [There's a standard class intended to be used this way](https://en.cppreference.com/w/cpp/types/aligned_storage)... doesn't directly answer the question w.r.t `char`, does prove that the standard expects at least something along these lines to be possible. – Alex Celeste Jul 08 '18 at 12:30
  • Would be a strict aliasing violation in plain C . That's why I like to use alloca and pretend (since alloca isn't standardized) it returns untyped memory like malloc . – Petr Skocik Jul 08 '18 at 12:38
  • @PSkocik In C we do not have the same problem, as C does not know constructors being called on object definition; i. e. we can just allocate the data type directly and afterwards call some initialiser function at the time `construct` is called in given example... – Aconcagua Jul 08 '18 at 12:44
  • If this class will actually be used, it ought to either implement or delete copy and assignment operations, following the Rule Of Three or Rule Of Five. As is it's far too easy to accidentally create undefined behavior. – aschepler Jul 08 '18 at 17:22
  • @PSkocik: The way the C Standard is written, the vast majority of code that uses structures or unions invokes UB, since N1570 6.5p7 treats *all* situations where the stored value of an aggregate is accessed using an lvalue of a non-character member type as UB--*even if the lvalue is of the form `aggregate.member` or `aggretatePtr->member`*. The ability to handle various situations where a pointer or lvalue of one type is used to derive one of another type that is in turn used to access the original object is left as a Quality of Implementation issue. Any implementation that... – supercat Jul 09 '18 at 19:06
  • ...can't handle situations where code uses a pointer or lvalue of one type to derive one of another, and then uses the resulting type before doing anything else related to the original objec, or entering a function or loop that does so, should be considered to be of low quality unless it is intended for use exclusively in application fields that would never need such ability. – supercat Jul 09 '18 at 19:14

4 Answers4

10

This program technically has undefined behavior, although it's likely to work on most implementations. The issue is that a cast from char* to T* is not guaranteed to result in a valid pointer to the T object created by placement new, even though the char* pointer represents the address of the first byte used for storage for the T object.

[basic.compound]/3:

Pointers to layout-compatible types shall have the same value representation and alignment requirements ([basic.align]).

In general, T will not be layout-compatible with char or with alignas(T) char[sizeof(T)], so there's no requirement that a pointer T* has the same value representation as a pointer char* or void*.

[basic.compound]/4:

Two objects a and b are pointer-interconvertible if:

  • they are the same object, or

  • one is a union object and the other is a non-static data member of that object ([class.union]), or

  • one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or

  • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast. [ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]

[Aside: DR 2287 changed "standard-layout union" to "union" in the second bullet after the publication of C++17. But that doesn't affect this program.]

The T object created by the placement new is not pointer-interconvertible with object_ or with object_[0]. And the note hints that this might be a problem for casts...

For the C-style cast ((T*)object_), we need to see [expr.cast]/4:

The conversions performed by

  • a const_cast,

  • a static_cast,

  • a static_cast followed by a const_cast,

  • a reinterpret_cast, or

  • a reinterpret_cast followed by a const_cast

can be performed using the cast notation of explicit type conversion....

If a conversion can be interpreted in more than one of the ways listed above, the interpretation that appears first in the list is used, even if a cast resulting from that interpretation is ill-formed.

Unless T is char or cv-qualified char, this will effectively be a reinterpret_cast, so next we look at [expr.reinterpret.cast]/7:

An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type "pointer to cv T", the result is static_­cast<cv T*>(static_­cast<cv void*>(v)).

So first we have a static_cast from char* to void*, which does the standard conversion described in [conv.ptr]/2:

A prvalue of type "pointer to cv T", where T is an object type, can be converted to a prvalue of type "pointer to cv void". The pointer value ([basic.compound]) is unchanged by this conversion.

This is followed by a static_cast from void* to T*, described in [expr.static.cast]/13:

A prvalue of type "pointer to cv1 void" can be converted to a prvalue of type "pointer to cv2 T", where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

As already noted, the object of type T is not pointer-interconvertible with object_[0], so that sentence does not apply, and there's no guarantee that the result T* points at the T object! We're left with the sentence saying "the pointer value is unchanged", but this might not be the result we want if the value representations for char* and T* pointers are too different.

A Standard-compliant version of this class could be implemented using a union:

template<typename T>
class StaticObject
{
public:
    StaticObject() : constructed_(false), dummy_(0) {}
    ~StaticObject()
    {
        if (constructed_)
            object_.~T();
    }
    StaticObject(const StaticObject&) = delete; // or implement
    StaticObject& operator=(const StaticObject&) = delete; // or implement

    void construct()
    {
        assert(!constructed_);

        new(&object_) T;
        constructed_ = true;
    }

    T& operator*()
    {
        assert(constructed_);

        return object_;
    }

    const T& operator*() const
    {
        assert(constructed_);

        return object_;
    }

private:
    bool constructed_;
    union {
        unsigned char dummy_;
        T object_;
    }
};

Or even better, since this class is essentially attempting to implement an optional, just use std::optional if you have it or boost::optional if you don't.

Community
  • 1
  • 1
aschepler
  • 70,891
  • 9
  • 107
  • 161
  • So summary is: You cannot cast a `unsigned char[]`-storage to a `T*` because they are (generally) not pointer-interconvertible, correct? How does that cover the use cases of [aligned_storage](https://en.cppreference.com/w/cpp/types/aligned_storage)? Prior to C++17 (requirement to use `std::launder`) this seemed to be a very valid use case. I think the rules are there to cover the alignment issue as indicated by `does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified.` See also the answer of @eerorika – Flamefire Aug 14 '19 at 08:57
  • @Flamefire The alignment requirement is separate from the pointer-interconvertible requirement. Yes, before C++17 this was fine given relaxed pointer safety, because of [basic.compound]/3 "If an object of type `T` is located at an address `A`, a pointer of type _cv_ `T*` whose value is the address `A` is said to _point to_ that object, regardless of how the value was obtained." And there was no "pointer-interconvertible", so the cast is defined to preserve the address. – aschepler Aug 15 '19 at 22:56
  • @Flamefire But also, the cast produces a pointer which points at the `T` but is not a safely-derived pointer value. So even in C++11/C++14, if the implementation has strict pointer safety, that pointer is invalid. The fix would be `std::declare_reachable`. I'm not sure `std::launder` actually fixes either issue in C++17, since there's no longer a guarantee the result of the cast represents the same address in the first place. – aschepler Aug 15 '19 at 23:07
  • Isn't `std::declare_reachable` for GC? And I do think "the result of the cast represents the same address", see my answer. Please check if I missed anything but I believe it to be correct. I tried to collect all pieces and connect them to why it works. Otherwise commonly used constructs won't be valid and `std::aligned_storage` would be useless. – Flamefire Aug 16 '19 at 09:28
7

Casting a char array to an object pointer - is this UB?

Casting one pointer (the array decays to a pointer) to another pointer that is not in same inheritance hierarchy using a C-style cast performs a reinterpret cast. A reinterpret cast itself never has UB.

However, indirecting a converted pointer can have UB if an object of appropriate type has not been constructed into that address. In this case, an object has been constructed in the character array, so the indirection has well defined behaviour. Edit: The indirection would be UB free, if it weren't for the strict aliasing rules; see ascheplers answer for details. aschepler shows a C++14 conforming solution. In C++17, your code can be corrected with following changes:

void construct()
{
    assert(!constructed_);
    new (object_) T; // removed cast
    constructed_ = true;
}

T& operator*()
{
    assert(constructed_);
    return *(std::launder((T*)object_));
}

To construct an object into an array of another type, three requirements must be met to avoid UB: The other type must be allowed to alias the object type (char, unsigned char and std::byte satisfy this requirement for all object types), the address must be aligned to the memory boundary as required by the object type and none of the memory must overlap with the lifetime of another object (ignoring the underlying objects of the array which are allowed to alias the overlaid object). All of those requirements are satisfied by your program.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Your second paragraph is wrong, because you're violating struct aliasing (`object_` doesn't have type `T*`) – Rakete1111 Jul 08 '18 at 16:45
  • @Rakete1111 It doesn't need to have type `T*`. Reusing `char` storage is allowed. – eerorika Jul 08 '18 at 17:41
  • Reusing `char` storage is allowed yes, but what I'm saying is that you can cast an object pointer to `char*` (and back) but if you start with a `char*`, then you cannot get an object pointer from it, even with `new`. – Rakete1111 Jul 08 '18 at 18:49
  • @Rakete1111 That's incorrect. If reusing `char` storage is allowed (and it is), then how do you propose to access the reused object if *"you cannot get an object pointer from it"* were true? – eerorika Jul 08 '18 at 19:39
  • @Rakete1111 I did a bit of researching, and it seems you're correct, though I find the rules frustrating. As I understand, OP's code could be fixed by removing the bad cast in `construct`, and by adding `std::launder` before indirecting the converted pointer. But `std::launder` is not in C++14 :( – eerorika Jul 08 '18 at 20:31
  • Yeah, me too. There is a proposal in the works that will fix this, so all hope is not lost :) – Rakete1111 Jul 08 '18 at 20:32
  • @eerorika you can also access the reused object through the pointer returned by placement new – ALEJANDRO PEREZ MORENO Jan 30 '21 at 21:27
  • @ALEJANDROPEREZMORENO Sure, but storing that pointer within the object would be wasteful. – eerorika Jan 30 '21 at 21:34
0

After writing the comment to @aschepler answer I think I found the proper answer:

No it is not UB!

Very strong hint: aligned_storage is exactly for doing that.

  • basic.compound[4] gives us the definition of "pointer-interconvertible". None of the cases apply so T* and unsigned char[...] are not pointer-interconvertible.
  • conv.ptr[2] and expr.static.cast[13] tells us what happens on the reinterprer_cast<T*>(object_). Basically the (intermediate) cast to void* does not change the value of the pointer and the cast from void* to T* does also not change it:

    If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

    We have a properly aligned, not pointer-interconvertible type here. Hence unchanged value.

  • Now before P0137 (found in another answer) basic.compound[3] said:

    If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

    Now it says basic.compound[3]

    Every value of pointer type is one of the following:

    (3.1) a pointer to an object or function (the pointer is said to point to the object or function), [...]

    Which I consider equivalent for this purpose.

  • Finally we need basic.lval[11]

    If a program attempts to access the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:52 [...]

    (11.3) a char, unsigned char, or std​::​byte type.

    This boils down to the aliasing rules which only allow certain types to alias and our unsigned char is part of it.

So in summary:

  • We meet the alignment and aliasing rules
  • We get a defined pointer value to a T* (which is the same as the unsigned char*)
  • Hence we have a valid object at that place

This is basically what @eerorika also has. But I think from the arguments above that the code is fully valid at least if T does not have any const are reference members in which case std::launder must be used. Even then, if the memory is not reused (but only used for creating 1 T) then it should also be valid.

However older GCC (<7.2) complains about strict-aliasing violation: https://godbolt.org/z/Gjs05C although the docu states:

For example, an unsigned int can alias an int, but not a void* or a double. **A character type may alias any other type. **

This is a bug

Flamefire
  • 5,313
  • 3
  • 35
  • 70
  • The aliasing rules are not symmetric. Accessing any type via `unsigned char` is allowed, but not the opposite. So it's illegal to use a cast from buffer to even types like `int` if no such `int` objects have been created there. There are four options for pointer values in basic.compound[3], so you need to prove the result of the cast is a "pointer to an object" and not an "invalid pointer value". Even if the pointer does "represent the address" of the object, that no longer implies it "points at" the object. Your interpretation seems to imply "pointer-interconvertible" makes no difference. – aschepler Aug 16 '19 at 11:47
  • With the first 2 points it is guaranteed that the cast will yield the same pointer value as the `unsigned char` array. The placement new places an object `T` there. So using that buffer afterwards casted to `T*` is valid, isn't it? As for the aliasing rule see my last point: It allows to access any T via `unsigned char*`. The placement new created an object there so this is valid again. "pointer-interconvertible" **does** make a difference: If the types were "pointer-interconvertible" then the address might change (which is unwanted here) – Flamefire Aug 19 '19 at 18:54
  • I'm now convinced "the pointer value is unchanged" refers to the possible values described in [basic.compound]/3, not to the value representation. So `reinterpret_cast`, via the `static_cast` from `void*`, can give a pointer of type `X*` which points at an object of type `Y`, with the two types unrelated. This is the meaning implied by the comments in the first example at https://en.cppreference.com/w/cpp/language/reinterpret_cast#Notes ... So my answer does need some edits. – aschepler Aug 19 '19 at 22:33
  • So in the OP, `((T*)object_)` points at the `char` object `object_[0]`, and does not point at the object of type `T` created by placement new. Accessing via that pointer is an aliasing violation, since it's accessing a `char` via type `T`, although accessing any object via `char` is allowed. The only technically correct ways to do this, including uses of `std::aligned_storage`, are using `std::launder`, using the union technique to make the objects pointer-interconvertible, or separately storing the pointer value result of the placement-new expression (which is usually wasteful). – aschepler Aug 19 '19 at 22:38
  • [On reddit](https://www.reddit.com/r/cpp/comments/5noem9/a_personal_tale_on_a_special_value/dchdtsv?utm_source=share&utm_medium=web2x) I found "§18.6.2.3/1-2 make it clear that placement-new always returns exactly the address passed to it". So we can be sure that `placementNewResultPtr == ((T*)object_)`. `basic.lval[11]` allows us to "access the stored value of T through a glvalue of type unsigned char`. So why is this an aliasing violation? Especially if we only ever use the casted-T pointer we should also not run into lifetime issues. – Flamefire Aug 20 '19 at 11:27
  • A pointer with the same type as an object during its lifetime and representing the object's address does not always point at that object, or they would have left the old [basic.compound]/3 alone. We are not accessing a `T` object via a glvalue of character type: a pointer cast does not "access" the object it points to, but using the result of the unary `*` operator does, and the glvalue there is of type `T`. Will update my answer to clarify more later. – aschepler Aug 20 '19 at 11:47
  • A thanks. So this is just "you can read the bytes of any T". Then `[basic.compound]/3` is hard: Why can't I read it as "this value of pointer type T* points to an object of type T`? If I never use the char-buffer as a char buffer but only to get a pointer to T, wouldn't this be allowed? Maybe: [basic.life[6.4]](https://timsong-cpp.github.io/cppwp/basic.life#6.4)? My problem is: The buffer is only used as an allocation, never directly. If this was UB, many real-world programs run into that UB due to C++11 `std::aligned_storage`, custom allocators, variants, ... Can't imagine this to be true – Flamefire Aug 20 '19 at 12:27
-1

You do have undefined behavior.

object_ isn't a T*, so casting and dereferencing it is UB. You cannot use object_ to refer to the newly created object. This is also known as strict aliasing.

The fix however is easy: Just create a new member variable T* that you use to access the constructed object. Then you need to assign the result of placement new to that pointer:

ptr = new(object_) T;

[basic.life]p1 says:

The lifetime of an object o of type T ends when:

  • if T is a class type with a non-trivial destructor, the destructor call starts, or

  • the storage which the object occupies is released, or is reused by an object that is not nested within o.

So by doing new (object_) T;, you are ending the lifetime of the original char[] object and are starting the lifetime of the new T object we'll call t.

Now we have to examing whetherr *((T*)object_) is valid.

[basic.life]p8 with the important bits highlighted:

If, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, a new object is created at the storage location which the original object occupied, a pointer that pointed to the original object, a reference that referred to the original object, or the name of the original object will automatically refer to the new object and, once the lifetime of the new object has started, can be used to manipulate the new object, if:

  • the storage for the new object exactly overlays the storage location which the original object occupied, and

  • the new object is of the same type as the original object (ignoring the top-level cv-qualifiers), and

  • the type of the original object is not const-qualified, and, if a class type, does not contain any non-static data member whose type is const-qualified or a reference type, and

The second point is not true (T vs char[]), so you cannot use object_ as a pointer to the newly created object t.

Rakete1111
  • 47,013
  • 16
  • 123
  • 162
  • 1
    Sorry, but I still not get why casting and dereferencing would be UB if pointing to a validly constructed object. See also https://stackoverflow.com/a/31615329/3723423 . Please provide references to the standard to demonstrate your case. – Christophe Jul 08 '18 at 17:25
  • 3
    @Rakete1111 `There is no T object.` this is wrong. There **is** a `T` object.The lifetime of that object begins on this line: `new ((T*)object_) T;`. What do you think your `ptr = new(object_) T;` does that the original line doesn't? That said, I would prefer your code since the cast in original code is redundant. – eerorika Jul 08 '18 at 17:48
  • @user2079303 Yeah, the sentence is confusing. What I mean is that you cannot use `object_` to access the new object. – Rakete1111 Jul 08 '18 at 18:54
  • @Christophe Done :) Your linked answer is wrong unfortunately. – Rakete1111 Jul 08 '18 at 19:18
  • 1
    Rakete1111 is right. You can not just take the pointer and cast it. You need `std::launder` to get defined behaviour again. an example is given here: https://en.cppreference.com/w/cpp/types/aligned_storage – phön Jul 09 '18 at 07:38