0

There is a closely related question about this topic already here, but the question was highly contested and the related discussion was a bit confusing to me. So is the following thinking correct?

My situation is the following: I have a data structure that uses chunks to store data. I want to preallocate a large number of chunks using something like std::vector<ChunkT> myChunks; myChunks.reserve(1000000); and fetch a new chunk without allocation whenever needed using ChunkT* newChunk = &myChunks.emplace_back();. I want the new chunk to be zero initialized but I rather prefer to do this initialization using a memset directly after reserving the memory instead of initializing one Chunk at a time once I fetch it. Provided that ChunkT is POD like e.g. struct {size_t keys[512]; size_t values[512];}; I was not sure about the following:

  1. is it safe to 0-initialize the memory using memset after reserve?
  2. is it guaranteed that I still have 0-initialized memory in the example of ChunkT being struct {size_t keys[512]; size_t values[512];}; after fetching my chunk with ChunkT* newChunk &myChunks.emplace_back()?

Regarding 1.) a user in the linked question argued that it would be unsafe because the standard does not guarantee what the std::vector implementation might be doing with the reserved memory (e.g. using it for internal bookkeeping). wpzdm argued that nothing surprising could be going on with the reserved memory. Reading all the related discussion I am thinking now that accessing the objects in only reserved memory is safe, since their life time already started (because they are POD and allocated by the vector's allocator) and so they are perfectly valid objects. However their content is not guaranteed at any point until the memory becomes part of the "valid" range e.g. through emplace_back, because the standard does not say that the vector implementation must not modify the reserved range (so 2.) is No?). But also the vector implementation cannot rely on the content of those reserved object since we are allowed to access and change them as we see fit. So neither "internal bookkeeping" nor setting debug flags to detect out-of-bounds accesses outside the "valid" but inside the reserved range or anything alike would be strictly standard-conforming because it could cause disallowed side effects. So only a malicious or non conforming compiler would be modifying the reserved range?

If I change ChunkT to struct {size_t keys[512]={0}; size_t values[512]={0};}; then content of the object after emplace_back is guaranteed, but this time because initialization takes place through construction. Also, now it would be undefined behaviour to access the only reserved memory because the lifetimes of the objects have not yet begun.

  • 1
    There is no legal way to access the memory provided after a `reserve` until an object occupies that space. You can try to provide your own Allocator that zeroes the memory, but I think `emplace_back` would still technically create an uninitialized object. Edit : In the linked question `&vec[0]` is UB because 0 is not less than `vec`'s size. This is because `std::vector` explicitly makes it UB. – François Andrieux Dec 03 '21 at 02:19
  • 1
    I am not sure what is not clear in previous question answer - you cannot access reserved memory, period. After that half of your question does not make any sense. – Slava Dec 03 '21 at 02:27
  • @FrançoisAndrieux: The point is that according to how I read the standard, there _are_ objects occupying the reserved space after allocation because they are POD types with non non-vacuous initialization (see also my comment below). I understand that &vec[0] is UB (though I did not find the explicit place in the standard) because you go through the vector implementation which is allowed to do whatever it wants if you give it nonsense. but if &vec[0] were valid because we inserted one element, then we could legally increment this pointer and access undefined _values_ without summoning deamons. – Lukas Brunner Dec 03 '21 at 12:26
  • 1
    @LukasBrunner It isn't just about `std::vector` not touching the memory. A compiler would be allowed to rely on the fact that you never access a vector's elements out of bounds and can transform the generated code accordingly. This could lead to unexpected behavior. You can't try to reason around doing something forbidden in C++. There is no way to get access to the underlying memory allocated by `reserve` because the `std::vector` shields it via its interface. – François Andrieux Dec 03 '21 at 14:01

2 Answers2

1
  1. is it guaranteed that I still have 0-initialized memory in the example of ChunkT being struct {size_t keys[512]; size_t values[512];}; after fetching my chunk with ChunkT* newChunk &myChunks.emplace_back()?

emplace_back() value initialises the object, so the zero-initialisation is guaranteed regardless of what the memory contained before the object was created.

eerorika
  • 232,697
  • 12
  • 197
  • 326
1
  1. is it safe to 0-initialize the memory using memset after reserve?

Maybe it works, but you'd better not. Accessing a nonexistent element through [] is UB.

  1. is it guaranteed that I still have 0-initialized memory in the example of ChunkT being struct {size_t keys[512]; size_t values[512];}; after fetching my chunk with ChunkT* newChunk &myChunks.emplace_back()?

Yes. In your situation, what emplace_back() do is construct a Chunk via placement-new, and POD-classes will be zero-initialized. ref: POD class initialized with placement new default initialized?

So, you don't have to worry about memset the allocated memory to zero. Please correct me if I am wrong.

Nimrod
  • 2,908
  • 9
  • 20
  • But why is it UB? The standard says in 3.8 1: The lifetime of an object of type T begins when: - storage with the proper alignment and size for type T is obtained, and - if the object has non-vacuous initialization, its initialization is complete. If I understand correctly that POD types do not have non-vacuous initialization, then accessing POD types after allocation before any additional initialization is explicitly defined behaviour. So it is not allowed to explode my harddrive, it is just allowed to result in undefined _values_ at that memory location. Or am I missing something? – Lukas Brunner Dec 03 '21 at 12:13
  • @LukasBrunner What will lead to UB is trying to get a pointer to the storage. Your quoted passage is about interpreting memory representation as a POD type, but the UB in your case would happen before that. – François Andrieux Dec 03 '21 at 14:06
  • @LukasBrunner The UB we are talking about here is nothing about POD class construction when you just `reserve` the vector. The UB happens when you try to `memset`(or use) the allocated memory through `&vector[0]` – Nimrod Dec 03 '21 at 17:34
  • Ah, ok! So very strictly speaking it would not be UB if I inserted at least one element? – Lukas Brunner Dec 04 '21 at 17:06
  • @LukasBrunner Yes, I think so. A better way should be to use `vector::data` since it's a director accessor to the memory. – Nimrod Dec 04 '21 at 18:08