3

When elements are default-inserted into an instance of std::vector<T>, they are value-initialized by default. I often work with multi-threaded high-performance codes, where such value-initialization might for large arrays represent an unacceptable sequential bottleneck.

The typical approach based on reserve() and push_back()/emplace_back() is of no use if in concurrent codes. I usually end up with one of the following options:

  1. definition of an empty default constructor for T,
  2. definition and usage of a custom allocator with empty construct() member function.

However, both solutions are far from being elegant and also have drawbacks. The former cannot be used for T being a POD type, such as double. The latter requires a given implementation of the C++ Standard Library to supoort the relatively new DefaultInsertable concept. Moreover, definition of a custom allocator is quite tedious.

Is there any chance that in the future of C++ there will be some straightforward way how to "turn off" this default-insertion/value-initialization?

UPDATE

Mayebe, I should've asked simply if it will be possible to avoid zero-initialization of default-inserted elements of a vector for arithmetic types.

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93
  • Nope. When an instance of a class gets constructed, it gets constructed. There is no alternative to constructing a new instance of a class. This goes to the fundamental integrity of the language. You will have to work with making the default constructor as lightweight as possible. – Sam Varshavchik Mar 13 '16 at 15:10
  • @SamVarshavchik All right, that's reasonable for classes. But, e.g., for a vector of doubles, I don't need the elements to be zeroed. – Daniel Langr Mar 13 '16 at 15:13
  • Do you have to use a vector? You could always just `malloc()` what you need. – Galik Mar 13 '16 at 15:25
  • It is possible, either with `std::vector` + a custom allocator or with the appropriate Boost vector variant. There is a dupe for that somewhere (a better one than [this](https://stackoverflow.com/questions/5958572/how-can-i-avoid-stdvector-to-initialize-all-its-elements) iirc), but I don't have the time to search for that right now. – Baum mit Augen Mar 13 '16 at 15:28
  • @Galik Yes, that's the third option, but it's the least elegant one in my opinion. That's why I'm using C++ to have all these nice features such as RAII, encapsulation, etc. Moreover, in some cases I cannot switch to `malloc` without a great deal of refactoring. – Daniel Langr Mar 13 '16 at 15:30
  • Maybe for legacy code you could overlay a malloced array with some kind of `array_view` psudo-container like `gsl::span`? That might help with compatability. – Galik Mar 13 '16 at 15:35
  • @BaummitAugen I definitely don't want to make codes dependent of Boost just to solve this problem. Moreover, it might need a lot of refactoring. I discuss a custom-allocator solution in my question. But maybe that's the way, to define some _universal custom allocator_ that would prevent zero-initialization for arithmetic types and otherwise behave as usual (such an allocator might be useful for future C++ :). – Daniel Langr Mar 13 '16 at 15:37
  • 1
    Such an allocator is ready for use in the dupe I am thinking about, so there would be no need to redo that. If you don't want Boost, that is probably the easiest way. Some refactoring must occur bc. normal `std::vector` must value initialize bc. the standard says so, no way around that. I am not aware of the committee discussing easier alternatives. – Baum mit Augen Mar 13 '16 at 15:42
  • Can you elaborate where values in your code are *default-inserted*? I can only think of `resize` and the `count` ctor, I guess I am overlooking something. Also I would *hope* that default-initialized vectors of arithmetic types simply get a zeroed memory page from the OS. – Zulan Mar 13 '16 at 15:43
  • @BaummitAugen Hopefully, the default-insertion concept will be more and more supported in the future. It is even not mentioned in the ISO C++11 Standard. I think it has been introduced by the [N3346 Defect Report](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3346.pdf). – Daniel Langr Mar 13 '16 at 15:49
  • @DanielLangr For most use cases [`std::is_arithmetic`](http://en.cppreference.com/w/cpp/types/is_arithmetic) or `is_pod` should be good enough, so if that default insertion stuff is the only thing holding you back, consider ditching it. – Baum mit Augen Mar 13 '16 at 15:53
  • @Zulan In `resize()` mostly. Have no idea about whether its solved via zeroed memory pages or not, but I experienced `resize()` which took tens of seconds because of zero-initialization, when executed on modern HPC hardware, such as Cray XC40. – Daniel Langr Mar 13 '16 at 15:54
  • What do you want its destruction behaviour to be when half constructed of non-POD data? – Yakk - Adam Nevraumont Mar 13 '16 at 15:55
  • @Yakk I've updated the question aiming at zero-initialization and POD (arithmetic) types. I didn't want to change the original question in the middle of discussion. Sorry about that. – Daniel Langr Mar 13 '16 at 15:57
  • @DanielLangr, 10s of seconds for zero-initialization sounds very odd. This is far below any reasonable memory bandwidth of an XC40. Can reproduce that with a small example and post that as a question? `resize` is of course generally bad due to the possible copy. – Zulan Mar 13 '16 at 16:07
  • @Zulan I guess it is not odd, when, e.g., 10s of GB of memory need to be zeroed. Such amount is not unusual in HPC, where one works with data structures representing, e.g., large sparse matrices or computational meshes. Simply write `std::vector v;` and then `v.resize(n)` while choosing very large `n`. This is exactly when I measured the runtime of `resize()`. There is no copy there. Yes, the memory bandwidth of Cray XC40 node is large, but you will never get even close to it in sequential code, that's the point. – Daniel Langr Mar 13 '16 at 16:23
  • @DanielLangr Out of curiosity, how do you (or do you) align the memory allocated by vector for SIMD operations? – RyanP Mar 13 '16 at 16:33
  • @RyanP In some particular cases, vectorization cannot be applied or brings no/negligible benefit. Moreover, compilers are able to generate special assembly code to deal with potentially unaligned array elements at its beginning and end, and process the majority of remaining elements being aligned. But you are right, alignment provided, e.g., by a custom allocator, is generally the best way. – Daniel Langr Mar 13 '16 at 16:45
  • @DanielLangr to my surprise the zero page cow doesn't seem to work consistently. If you actually touch the memory intended to be used by different threads via a single one, this can have terrible numa implications, potentially worse than the initialization cost. I do get the point of the question, but I'm not sure this should be addressed by allowing zero-initialization in the standard. IMHO this could and should be a reliable, yet implementation dependent, optimization. – Zulan Mar 13 '16 at 17:58

1 Answers1

3

Vector is poorly suited to your needs. It supports resizing and accidental copy, neither of which make sense in a multi-threaded environment.

Write a simple container:

 template<class T,class Storage=std::aligned_storage_t<sizeof(T),alignof(T)>{
 struct buffer{
   static_assert(std::is_pod<T)::value, "pod only");
   std::size_t count;
   std::unique_ptr<Storage[]> storage;
 };

Populate it with container-esque begin/end/front/size/[]/empty etc.

Make it move only.

Use rule of zero (with =default).

Give it a explicit buffer(std::size_t) ctor that creates its content uninitialized.

Mix in some span/array_view types, and this should be suitable for your needs.

Maybe have emplace(size_t,Args&&) which does placement new for you with {}.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • That's another option, which needs refactoring of existing codes. I guess the best approach is a custom allocator that guarantees alignment and prevents zero-initialization for POD types. Will try to find the one mentioned by Baum mit Augen in comments. – Daniel Langr Mar 13 '16 at 16:34
  • My whole point is that such an allocator would be a great extension of C++ Standard Library, in my opinion :) – Daniel Langr Mar 13 '16 at 16:48
  • @dan I was trying to solve your problem. If you want to propose something for standarizarion, a SO question is not a good forum. – Yakk - Adam Nevraumont Mar 13 '16 at 17:57
  • Thanks, it's always good to have more options to choose from. And I will learn more about the whole C++ standardization process. – Daniel Langr Mar 13 '16 at 18:47