0

I'm trying to implement an array-like container with some special requirements and a subset of std::vector interface. Here is a code excerpt:

template<typename Type>
class MyArray
{
public:
    explicit MyArray(const uint32_t size) : storage(new char[size * sizeof(Type)]), maxElements(size) {}
    MyArray(const MyArray&) = delete;
    MyArray& operator=(const MyArray&) = delete;
    MyArray(MyArray&& op) { /* some code */ }
    MyArray& operator=(MyArray&& op) { /* some code */ }
    ~MyArray() { if (storage != nullptr) delete[] storage; /* No explicit destructors. Let it go. */  }

    Type* data() { return reinterpret_cast<Type*>(storage); }
    const Type* data() const { return reinterpret_cast<const Type*>(storage); }

    template<typename... Args>
    void emplace_back(Args&&... args)
    {
        assert(current < maxElements);
        new (storage + current * sizeof(Type)) Type(std::forward<Args>(args)...);
        ++current;
    }

private:
    char* storage = nullptr;
    uint32_t maxElements = 0;
    uint32_t current = 0;
};

It works perfectly well on my system, but dereferencing a pointer returned by data seems to violate strict aliasing rules. It's also a case for naive implementation of subscript operator, iterators, etc.

So what is a proper way to implement containers backed by arrays of char without breaking strict aliasing rules? As far as I understand, using std::aligned_storage will only provide a proper alignment, but will not save the code from being broken by compiler optimizations which rely on strict aliasing. Also, I don't want to use -fno-strict-aliasing and similar flags due to performance considerations.

For example, consider subscript operator (nonconstant for brevity), which is a classical code snippet from articles about UB in C++:

Type& operator[](const uint32_t idx)
{
    Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr)); // Cast is OK.
    return *ptr; // Dereference is UB.
}

What is a proper way to implement it without any risk to find my program broken? How is it implemented is standard containers? Is there any cheating with non-documented compiler intrinsics in all compilers?

Sometimes I see code with two static casts through void* instead of one reinterpret cast:

Type* ptr = static_cast<Type*>(static_cast<void*>(storage + idx * sizeof(ptr)));

How is it better than reinterpret cast? As to me, it does not solve any problems, but looks overcomplicated.

Sergey
  • 7,985
  • 4
  • 48
  • 80
  • At a quick glance, it appears that you placement-new an object of correct type into the storage, so the pointer should be allowed to alias it. If I'm mistaken, please elaborate on why do you think you're violating aliasing rules. – eerorika Nov 08 '17 at 14:27
  • 1
    Side note: In your initializer list constructor, shouldn't `maxElements` be initialized to `init.size()`? – king_nak Nov 08 '17 at 14:27
  • However, the lack of destructor calls is suspicious. What kind of objects do you expect to be trivially destructable, but not trivially constructable? – eerorika Nov 08 '17 at 14:29
  • I don't see how this violates strict aliasing rules? What do you mean by dereferencing the pointer breaks strict aliasing rules? Without the use of aligned_storage your code might be a bit inefficient in that sometimes more memory reads would be issued than required. Otherwise I don't see why strict aliasing rules are broken here. – MS Srikkanth Nov 08 '17 at 14:47
  • `std::copy(init.begin(), init.end(), reinterpret_cast(storage))` is wrong. construct should be called in loop instead. – Jarod42 Nov 08 '17 at 15:04
  • I think it is ok in regard to strict aliasing. To be sure to not break strict aliasing, you may store in additional member `Type* data` the result of your first placement new. (so `data == storage` but with correct type). – Jarod42 Nov 08 '17 at 15:08
  • @user2079303 Both `char* storage` and a pointer returned by `data()` point to the same region of memory. Moreover, subscript operator will do something like `return *reinterpret_cast(storage + offset)`, i.e. dereference a pointer of incompatible type, which is UB. – Sergey Nov 09 '17 at 02:41
  • @Jarod42 Wouldn't dereferencing `data` lead to UB, as it points to the same memory as `char` pointer? – Sergey Nov 09 '17 at 02:56
  • @king_nak Ok, I removed malformed constructor not to divert reader's attention from the problem and added a more relevant example. – Sergey Nov 09 '17 at 03:16

1 Answers1

1

but dereferencing a pointer returned by data seems to violate strict aliasing rules

I disagree.

Both char* storage and a pointer returned by data() point to the same region of memory.

This is irrelevant. Multiple pointers pointing to same object doesn't violate aliasing rules.

Moreover, subscript operator will ... dereference a pointer of incompatible type, which is UB.

But the object isn't of incompatible type. In emplace_back, you use placement new to construct objects of Type into the memory. Assuming no code path can avoid this placement new and therefore assuming that the subscript operator returns a pointer which points at one of these objects, then dereferencing the pointer of Type* is well defined, because it points to an object of Type, which is compatible.

This is what is relevant for pointer aliasing: The type of the object in memory, and the type of the pointer that is dereferenced. Any intermediate pointer that the dereferenced pointer was converted from is irrelevant to aliasing.


Note that your destructor does not call the detructor of objects constructed within storage, so if Type isn't trivially destructable, then the behaviour is undefined.


Type* ptr = reinterpret_cast<Type*>(storage + idx * sizeof(ptr));

The sizeof is wrong. What you need is sizeof(Type), or sizeof *ptr. Or more simply

auto ptr = reinterpret_cast<Type*>(storage) + idx;

Sometimes I see code with two static casts through void* instead of one reinterpret cast: How is it better than reinterpret cast?

I can't think of any situation where the behaviour would be different.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Thanks for clarification! Also I removed subscript and fixed destructor to use sfinae which performs optimized deallocation only if `Type` is trivially destructible. – Sergey Nov 09 '17 at 11:31