Efficient out-of-order vector construction

Question

Suppose I want to construct a vector<T> of size N, where N is known beforehand (but not at compile time). Normally, I would call vector<T>::reserve(N) once and then call vector<T>::push_back(...) N times to fill the vector. However, for some reason I need to insert the elements out of order, although I know at which index each element should be inserted at beforehand.

Is there a good way to ensure that the constructor of T is called exactly N times in the construction of the vector?

If malloc does not initialize allocated memory, then raw arrays allow this kind of technique to work efficiently. However, the returned pointer is not moveable to vector, and at any rate, allowing a vector to steal memory from an malloc call (or maybe something like span with dynamically determined size) is not necessarily wise because it could allow class invariants of T to be violated if we do not actually initialize for every valid index. I guess the initialization guarantee is probably the crux of the issue here.

A simple example of this type of problem is calculating an inverse permutation, when permutations are represented as vector<size_t> perm representing the map i -> perm[i]. The performance might not be a big issue here, but T could also be a type much more expensive to construct than size_t.

This question is closely related, although the answers all suggest a solution that requires additional constructor calls.

EDIT: Let's say that T is a 500 byte POD type, and we need good cache performance in linear iteration over the vector, which is why storing T* and permuting is undesirable.

It seems like STL containers are not very compatible with the goal of touching each constructed object only once because they want to guarantee initialization of each object.

Peter's suggestion of modification-after-initialization is a good one. If T is endowed with a dummy constructor, we can use reserve(N) and then emplace_back with that dummy constructor N times. Afterwards we can properly initialize each element in any order.

Perhaps:

struct T {
    struct DummyConstructorMarker {};
    T(DummyConstructorMarker) {}
    char data[500];
};

vector<T> makeVector(vector<size_t>& initOrder, int N, /* other state */) {
    vector<T> result; result.reserve(N);
    for (int i = 0; i < N; i++) {
        result.emplace_back(T::DummyConstructorMarker());
    }
    for (size_t index: initOrder) {
        /* Assign to data stored at index */
    }
    return result;
}

Related: [uninitialized memory allocation in c++](https://stackoverflow.com/q/38764678/11082165) — Brian61354270, Aug 17 '23 at 00:37
Depends. Using `vector::resize(N)` instead of `reserve(N)` (or simply initialising the vector using `vector vec(N)` or `vector vec(N, initial_element)`) will call a constructor of `T` exactly `N` times (not including construction of `initial_element`). However, there is then a need to change state of vector elements - which can be done in any order, but the behaviour of operations used (e.g. assignment operator, setters provided by `T`, or something else) need to be considered. It may also be necessary to somehow ensure (or check) that all elements are updated. — Peter, Aug 17 '23 at 04:48
You may want to consider using another container (e.g. a list, map, ...) that may be better suited (e.g. for insertion of elements in the middle of a range) for your purposes. All you have stated is a need to minimise number of calls of constructors of your elements, so more specific advice isn't possible. — Peter, Aug 17 '23 at 04:58
The generic problem with this approach is that you would need to remember which elements have been initialized and which haven't. Otherwise, for instance in case of exception, you wouldn't know which elements to destroy. Anyway, did you consider the option to `push_back` elements in order and then permute their order w.r.t. `perm`? Is swapping of objects of your type cheap? — Daniel Langr, Aug 17 '23 at 06:27
Maybe you could use a vector of pointers, so that `T` will be constructed only `N` times. It would worth it only if `T` is slower to construct than pointers. — Fareanor, Aug 17 '23 at 09:14
What are the construction costs? Is `T` default constructible? Is `T` trivially copy/default constructible? — Red.Wave, Aug 17 '23 at 10:52

score 1 · Answer 1 · answered Aug 17 '23 at 09:35

A possible solution would be to use a vector of pointers instead.

In that case, T's constructor will be called only N times:

std::vector<std::unique_ptr<T>> v(4); // N == 4

// out of order insertion
v[2] = std::make_unique<T>(/*params*/);
v[3] = std::make_unique<T>(/*params*/);
v[0] = std::make_unique<T>(/*params*/);
v[1] = std::make_unique<T>(/*params*/);

Of course, the std::unique_ptr constructor will be called 2*N times. It's worth the effort if T is much slower to construct than std::unique_ptr.

You could still go with raw pointers if you want to reduce the overhead of creating std::unique_ptr objects, but at the cost of losing robustness (no more RAII) and with manual memory handling:

std::vector<T*> v(4, nullptr); // N == 4

// out of order insertion
v[2] = new T(/*params*/);
v[3] = new T(/*params*/);
v[0] = new T(/*params*/);
v[1] = new T(/*params*/);

// Release memory manually at the end
for(T* p : v)
    delete p;

Warning: In that case, if an exception occurs during the creation of the third T (for example), you may leak memory for the previous two if the deletion part is skipped as well (i.e. if it is not handled in the catch block for example).

Efficient out-of-order vector construction

1 Answers1