3

I'm trying to implement my own vector class (for study purpose). Currently I am stuck at vector::clear(), how can I implement it? It can't be like this, right?

void clear() {
    for (size_t i = 0; i < _size; ++i) {
        _data = 0;//can't work with custom class
    }
}

Or

void clear() {
    delete[] _data;
    _data = new T[_cap];
    _size = 0;
}//can we make it better?

Or as simple as:

void clear() {
    _size = 0;//fastest, but user can still access the data
}
T& operator[](size_t pos) {
    if (pos >= _size) throw;//added to prevent from accessing "cleared" data
    return _data[pos];
}

Did some digging through libcxx and found out they use alloc_traits::destroy, which destroy each element in _data???
Here is my class attribute

T* _data;
size_t _size;
size_t _cap;
Phineas
  • 159
  • 2
  • 10
  • If your goal is learning how to make something without every bell and whistle, I don't generally recommend the standard library implementations. They have very particular requirements compared to a lot of other code, which can often make things much more overwhelming when you aren't already an expert. You'll get the real deal with all the nitty gritty details, but that isn't as useful early on. – chris Aug 06 '20 at 04:48
  • Note that you can't reimplement `std::vector` portably prior to C++20, the iterator/reference invalidation requirements are incompatible with the requirements on `data` without [implicit object creation](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0593r2.html) – Caleth Aug 06 '20 at 08:17
  • @Caleth afaik "Implicit creation of objects for low-level object manipulation" is applied retroactively to all standards. – bolov Aug 06 '20 at 10:36

2 Answers2

4

If you keep a capacity larger than the size this means you need to account for allocated memory that doesn't hold any object. This means that the new T[_cap] approach simply doesn't work:

  • first and foremost your vector won't work for objects that are not default constructible
  • even for those that are, you will be creating more objects than requested and for some objects construction can be expensive.
  • the other problem is when you push_back when you sill have capacity you will be doing assignment instead of construction for the object (because an object already exists there)

So you need to decouple memory allocation from object creation:

  • allocate memory with operator new. Please note this is different than the new expression you are most familiar with.
  • construct an object in the allocated memory with in-place constructor (also kown as placement new)
  • destroy an object by explicitly calling the destructor
  • deallocate memory with operator delete

C++17 brings some utility functions for this purpose like: std::uninitialized_default_construct, uninitialized_copy, std::destroy etc.; find more in Dynamic memory management

If you want to be more generic like std::vector you can use allocators instead.


With this in mind, now answering your specific question about clear. The behavior of std::vector::clear is: "Erases all elements from the container. After this call, size() returns zero. [...] Leaves the capacity() of the vector unchanged". If you want the same behavior, this means:

void clear() noexcept
{
    for (T* it = _data; it != _data + _size; ++it)
        it->~T();

    // or
    std::destroy(_data, _data + size);

    _size = 0;
}

As you can see implementing something like std::vector is far from trivial and requires some expert knowledge and techniques.

Another source of complications comes from strong exception safety. Let's consider just the case of push_back for exemplification. A naive implementation could do this (pseudocode):

void push_back(const T& obj)
{
    if size == capacity
         // grow capacity
         new_data = allocate memory
         move objects from _data to new_data
         _data = new_data
         update _cap

    new (_data + _size) T{obj}; // in-place construct
    ++_size;
}

Now think what will it happen if the move constructor of one object throws while moving to the new larger memory. You have a memory leak and worst: you are left with some objects in your vector in a moved-from state. This will bring your vector in an invalid internal state. That's why is important that std::vector::push_back guarantees that either:

  1. the operator is successful or
  2. if an exception is thrown, the function has no effect.

In other words, it grantees that it never leaves the object in an "intermediary" or invalid state, like our naive implementation does.

bolov
  • 72,283
  • 15
  • 145
  • 224
  • about the push_back: 1. `new_data = allocate memory` 2. `std::copy` to `new_data` 3. `std::swap` Does this one provide strong exception? – Phineas Aug 06 '20 at 15:44
  • @Phineas almost. You need to make sure you don't leak memory. The stdlib implementations do something similar: if `T` is `nothrow_move_constructible` then they use move, otherwise they fallback to the slower `copy` – bolov Aug 06 '20 at 15:59
0

The only responsibility of clear is to call the destructor on any objects that are in the array and set the array size to 0 (http://www.cplusplus.com/reference/vector/vector/clear/). Many implementations do not actually free the memory, but simply set the end and last pointers in the array to the beginning pointer after running through the vector and calling the destructor on each element. This is done as an optimization so that the memory is still available and the vector is ready to go if new elements are pushed onto the vector.

If you don't need that particular optimization, your version of clear() where you simply delete[] the data and then reallocate is perfectly reasonable. It all depends on what tradeoffs you want to make.

Durstann
  • 122
  • 11
  • `delete[]` is exactly what you can't do, as that would invoke the destructor on all elements, even the ones which hadn't been constructed yet. – Ext3h Aug 06 '20 at 04:48
  • Depends on his implementation. If he wasn't constructing the objects, that's true. but from that version of clear() it appears he is always keeping a constructed array of objects around up to the capacity. Perhaps more code is required from the original poster if we are to proceed further? – Durstann Aug 06 '20 at 04:53
  • currently `_data` always construct with at least 1 element, for easier push_back (if exceed the _cap: `_cap * 2`). – Phineas Aug 06 '20 at 06:08
  • 2
    *"Many implementations do not actually free the memory ... done as an optimization"* They do that because the standard [forces them to](https://en.cppreference.com/w/cpp/container/vector/clear). It probably does that for optimization purposes, yes, but implementations have no choice. – HolyBlackCat Aug 06 '20 at 08:37