7

I have a use case for creating an std::vector with many elements, each of which is of a simple, but non-primitive, type (POD struct). The vector and type are sufficiently large/complex that in the following,

std::vector<U> v;
v.resize(1000000000);
for(size_t i=0;i<v.size();++i){/* initialize v[i] */}

the resize call is noticeably slow. And it's wasteful because resize is default-initializing all those entries, then I'm going through in a loop and setting them all to their correct/useful values.

What I would like to do is to allocate all the memory for the vector, but not initialize any of the entries, then go through in parallel and initialize all the entries, e.g. with OpenMP

std::vector<U> v;
v.reserve(1000000000);
#pragma omp parallel for
for(size_t i=0;i<v.size();++i){/* initialize v[i] */}

However, reserve doesn't actually change the size of v, so I'd have to keep doing push_back in my loop, which won't maintain the proper ordering of the elements (which matters in my use case); I really want to write something like v[i] = ... in my loop body.

Is there a way to allocate/"initialize" a vector without initializing any of its elements, and to then fill in / initialize all the elements in parallel?

davewy
  • 1,781
  • 1
  • 16
  • 27
  • If you have POD then default ctor does nothing, does it not? – Slava Dec 17 '19 at 19:28
  • @SlavasupportsMonica Typically a vector implementation will call `memset` on POD types which while fast will still have some cost. – NathanOliver Dec 17 '19 at 19:29
  • If you know the size you need, why not use an old fashioned array instead? – Born2Smile Dec 17 '19 at 19:34
  • @Born2Smile An array that large will come with its own problems, notably if it's too large, you can overrun the stack. –  Dec 17 '19 at 19:39
  • 1
    @Chipster, so don't put it on the stack. `auto v = std::make_shared(1000000000)` – Born2Smile Dec 17 '19 at 19:45
  • 2
    @Born2Smile, well, true. But that isn't the only problem. There is also the problem that [dynamically allocated and normal arrays will still default construct](https://stackoverflow.com/questions/59380836/c-allocate-memory-for-an-stdvector-then-initialize-its-elements-in-parallel?noredirect=1#comment104953941_59380989), which the OP expressed an interest in not doing. –  Dec 17 '19 at 19:50
  • You could allocate your data with `new`, initialize that in parallel (to get first-touch NUMA-friendly behavior) then convert it to `std::vector` as described in https://stackoverflow.com/questions/2434196/how-to-initialize-stdvector-from-c-style-array. – Jeff Hammond Dec 31 '19 at 17:33
  • @JeffHammond Not sure if this changed in the meantime, but the answers behind that link only discuss solutions that initialize a vector from a C-style array by copying which would in turn undo the NUMA-friendliness... – paleonix Dec 08 '22 at 00:13

2 Answers2

4

Your options are:

  • Replace std::vector with an alternative (e.g. uvector)
  • Use some sort of library to resize without initialization, such as UninitializedMemoryHacks from Facebook.

After you've performed the resize, you can use OpenMP in the usual ways.

Richard
  • 56,349
  • 34
  • 180
  • 251
3

It depends on the default constructor for your type U. If the default constructor is cheap, it is very unlikely that you will gain anything parallelizing it.

struct U {
   int a, b, c;
   U():a(0), b(1), c(2) {}
};

If your default constructor is expensive, it would make more sense to split it in two parts: One for default initialization and a function for the actual initialization.

struct U {
   vector<int> a;
   U() {}
   void init(int n) { a.resize(n); }
};

In both alternatives, the regular resize or assign call to the vector would be very hard to beat.

If you really are set in doing things this way, you could use a reinterpret_cast to an array. This way, the default constructor won't be called.

U * u_array = reinterpret_cast<U*>(malloc(100*sizeof(U)));

I strongly advise against this last option.

mentatkgs
  • 1,541
  • 1
  • 13
  • 17
  • 1
    Note that the `malloc` solution would ideally have `struct U` containing only POD types. With `U` containing a `std::vector` as listed, you'd have to jump some extra hoops to release the resources held by `vector` within `U` before `free` is called on your `malloc`'d array. Not to mention the hoops to jump to construct `U` in your array. – Larry B Dec 17 '19 at 22:35
  • 1
    You are totally correct. You really need to know what you are doing when you use an reinterpret_cast. – mentatkgs Dec 19 '19 at 01:20