52

I read that std::vector should be contiguous. My understanding is, that its elements should be stored together, not spread out across the memory. I have simply accepted the fact and used this knowledge when for example using its data() method to get the underlying contiguous piece of memory.

However, I came across a situation, where the vector's memory behaves in a strange way:

std::vector<int> numbers;
std::vector<int*> ptr_numbers;
for (int i = 0; i < 8; i++) {
    numbers.push_back(i);
    ptr_numbers.push_back(&numbers.back());
}

I expected this to give me a vector of some numbers and a vector of pointers to these numbers. However, when listing the contents of the ptr_numbers pointers, there are different and seemingly random numbers, as though I am accessing wrong parts of memory.

I have tried to check the contents every step:

for (int i = 0; i < 8; i++) {
    numbers.push_back(i);
    ptr_numbers.push_back(&numbers.back());
    for (auto ptr_number : ptr_numbers)
       std::cout << *ptr_number << std::endl;
    std::cout << std::endl;
}

The result looks roughly like this:

1

some random number
2

some random number
some random number
3

So it seems as though when I push_back() to the numbers vector, its older elements change their location.

So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?

Edit: Is std::vector contiguous only since C++17? (Just to keep the comments on my previous claim relevant to future readers.)

egst
  • 1,605
  • 3
  • 18
  • 25
  • 13
    The vector has to keep its promise to have elements contiguous, which means if vector has to move the elements to a bigger space, it does. -- *I am aware, that std::vector is a contiguous container only since C++17* -- It has been contiguous since 1998. – PaulMcKenzie Sep 14 '18 at 10:32
  • That is what I read on cppreference: std::vector (for T other than bool) meets the requirements of Container, AllocatorAwareContainer, SequenceContainer , ContiguousContainer (since C++17) and ReversibleContainer. – egst Sep 14 '18 at 10:34
  • 5
    @McSim Those are just the official names that are used now. It was always guaranteed `std::vector` stored its items in contiguous memory. Unofficially it was contiguous, because there was a defect in the standard not stating it was contiguous. Then (if I recall), the defect was corrected in the 03 standard. – PaulMcKenzie Sep 14 '18 at 10:36
  • 3
    If you want to observe the effect of contiguous memory then use `reserve` in advance. – macroland Sep 14 '18 at 10:36
  • @PaulMcKenzie Thanks for the explanation. – egst Sep 14 '18 at 10:41
  • 1
    Slightly on a tangent, if you need random access *and* pointer stability, take a look at `std::deque` with `push_back()`/`emplace_back()`. – Arne Vogel Sep 14 '18 at 10:58
  • @PaulMcKenzie http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.66.6263 was published in 2006 and requires that vectors are not contiguous, so what you could call a 'defect' was regarded as a feature to allow efficient implementation by Bjarne Stroustrup as late as that. I'm not sure offhand which version it was made part of the standard either. – Pete Kirkham Sep 14 '18 at 13:11
  • Add `numbers.reserve(8);` just above your first "for" loop and you will stop seeing random numbers. – VL-80 Sep 14 '18 at 17:06
  • "some random number" is not helpful for understanding your observations. – Russell Borogove Sep 14 '18 at 18:02
  • @RussellBorogove: By "some random number" I meant, that the number is irrelevant. It is simply a number, sometimes small, sometimes large, but it's never related to the original numbers and different in every iteration and every run. But I might update the question with examples of the actual numbers. – egst Sep 14 '18 at 23:16
  • The numbers are memory addresses, and extremely relevant to understanding the behavior of std::vector. – Russell Borogove Sep 15 '18 at 03:16
  • if you want contiguous data in stack, use [`std::array`](https://en.cppreference.com/w/cpp/container/array). Of course now you loose the ability to freely resize like vector. The vector has to keep its data on the heap so that it can store a huge amount of data (much bigger than the stack), and grow when needed – phuclv Sep 15 '18 at 03:58
  • @RussellBorogove: I'm doing `cout << *ptr_number` so these are not addresses (it's dereferenced by the *). And if I was writing addresses (without the *) it would give me the previous addresses, not the ones the vector moved to. – egst Sep 15 '18 at 09:41

5 Answers5

83

It roughly looks like this (excuse my MS Paint masterpiece):

vector memory layout

The std::vector instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.


So it seems as though when I push_back() to the numbers vector, its older elements change their location.

The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.


Does it maybe store them together, but moves them all together, when more space is needed?

Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector only if no reallocation takes place.


I am aware, that std::vector is a contiguous container only since C++17

The memory layout of std::vector hasn't changed since its first appearance in the Standard. ContiguousContainer is just a "concept" that was added to differentiate contiguous containers from others at compile-time.

Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
  • 45
    Why is the arrow not exactly vertical in your drawing? Made me hesitate before upvoting. – lubgr Sep 14 '18 at 10:41
  • 54
    @lubgr: I intentionally made it skewed so that it is not mistaken for `operator->`. – Vittorio Romeo Sep 14 '18 at 10:43
  • 6
    "*Iterator and address stability of elements is **not** guaranteed*" - it actually is, as long as no reallocation takes place. If you `push_back()` with `size < capacity`, address stability is guaranteed – Fureeish Sep 14 '18 at 10:57
  • Probably ought to mention that the reallocation probably won't be happening if you use reserve() to make it allocate enough space to store all the new elements. That may solve the OP's basic problem. – T.E.D. Sep 14 '18 at 13:20
  • 16
    @VittorioRomeo That's ridiculous, `operator->()` is horizontal. `operator↑()` on the other hand... – Yakk - Adam Nevraumont Sep 14 '18 at 18:36
  • 14
    @lubgr "exactly vertical" arrows require hardware support – Ti Strga Sep 14 '18 at 20:37
  • Actually in 03 wording was changed to guarantee continuity – PlasmaHH Sep 14 '18 at 21:21
  • 1
    @TiStrga no, any proper drawing tool will allow you to draw straight lines by holding Shift while dragging the mouse – phuclv Sep 15 '18 at 03:52
  • 2
    @phuclv, Straight yes, but an exactly vertical line also requires a properly oriented screen. – ilkkachu Sep 15 '18 at 08:27
  • @lubgr, it is exactly vertical if one chooses the appropriate e_1 and e_2 unit vectors to that define the vector space. ;-) Leads one to wonder Why TF did the designers choose such an ambiguous term for "array" when we already have a word for that... – SO_fix_the_vote_sorting_bug Apr 25 '23 at 20:24
19

The Answer

It's a single contiguous storage (a 1d array). Each time it runs out of capacity it gets reallocated and stored objects are moved to the new larger place — this is why you observe addresses of the stored objects changing.

It has always been this way, not since C++17.

TL; DR

The storage is growing geometrically to ensure the requirement of the amortized O(1) push_back(). The growth factor is 2 (Capn+1 = Capn + Capn) in most implementations of the C++ Standard Library (GCC, Clang, STLPort) and 1.5 (Capn+1 = Capn + Capn / 2) in the MSVC variant.

growing std::vector

If you pre-allocate it with vector::reserve(N) and sufficiently large N, then addresses of the stored objects won't be changing when you add new ones.

In most practical applications is usually worth pre-allocating it to at least 32 elements to skip the first few reallocations shortly following one other (0→1→2→4→8→16).

It is also sometimes practical to slow it down, switch to the arithmetic growth policy (Capn+1 = Capn + Const), or stop entirely after some reasonably large size to ensure the application does not waste or grow out of memory.

Lastly, in some practical applications, like column-based object storages, it may be worth giving up the idea of contiguous storage completely in favor of a segmented one (same as what std::deque does but with much larger chunks). This way the data may be stored reasonably well localized for both per-column and per-row queries (though this may need some help from the memory allocator as well).

bobah
  • 18,364
  • 2
  • 37
  • 70
  • 5
    Nitpicking: growth factor depends on implementation. I heard MSVC uses 1.5 – HolyBlackCat Sep 14 '18 at 10:33
  • 3
    Growth factor is implementation-defined. – Vittorio Romeo Sep 14 '18 at 10:34
  • Yes in theory, but not in practice, because to achieve amortized `O(1)` `push_back()` it needs to grow the storage as 2x. No one, of cause, in a sane mind would let it do that after certain capacity as it may eat all available memory very quickly, but that's done manually in the client code. – bobah Sep 14 '18 at 10:37
  • 3
    @bobah In practice it is not 2x in some implementations. –  Sep 14 '18 at 10:53
  • Does the growth factor actually affect the amortized complexity? – Galik Sep 14 '18 at 10:55
  • @Ivan - which ones? – bobah Sep 14 '18 at 10:55
  • @Galik - yes, in a very straightforward way - the growth factor determines how many times a stored object gets moved by the time the vector size reaches N. – bobah Sep 14 '18 at 10:57
  • 1
    @bobah Yes but over time (usage) that additional cost is reduced tending towards zero (but never reaching it). Doesn't the growth factor simply alter the rate of amortization? – Galik Sep 14 '18 at 10:59
  • 1
    @bobah Well this answer suggests that the growth rate merely needs to be exponential in order to achieve the required amortized complexity https://stackoverflow.com/a/5232342/3807729 – Galik Sep 14 '18 at 11:08
  • @FrançoisAndrieux - I removed the comment with the formula and instead here is the link to the discussion topic specifically dedicated to this: https://cs.stackexchange.com/questions/9380/why-is-push-back-in-c-vectors-constant-amortized – bobah Sep 14 '18 at 11:21
  • Neither the growth factor nor the rounding of allocations are set to any specific values by the standard. The growth factor may or may not be 2x and reserving memory may or may not round up to 32. – François Andrieux Sep 14 '18 at 11:26
  • [GCC - 2x](https://github.com/gcc-mirror/gcc/blob/1580b4793d88132747f587ac95ebc12d96fee5b9/libstdc%2B%2B-v3/include/bits/stl_vector.h#L1707), [Clang - 2x](https://github.com/llvm-mirror/libcxx/blob/master/include/vector#L2587) – bobah Sep 14 '18 at 11:42
  • [STLPort - 2x](https://github.com/vincenthsu/stlport/blob/v5.2.1/stlport/stl/_vector.h#L171) – bobah Sep 14 '18 at 11:48
  • 1
    @bobah MSVC version. Vectors member function `_Calculate_growth` calculates it like this: `const size_type _Geometric = _Oldcapacity + _Oldcapacity / 2;`. This was already mentioned in the first comment. –  Sep 14 '18 at 13:02
  • 5
    In fact, growth factor 2 is pretty bad and 1.5x actually works better, even though this may seem counter-intuitive. And [since this is well-known](https://github.com/facebook/folly/blob/master/folly/docs/FBVector.md), I’m somewhat surprised that stdlibc++ and libc++ still use 2x. – Konrad Rudolph Sep 14 '18 at 13:38
  • 3
    For those curious and unwilling to follow a link, a growth factor of 2 leads to the walking vector problem; no sum of previous buffers allocated for a vector can ever quite fits its new allocation. So on a system where the only meaningful allocation is a growing std::vector, a 2x growth factor means you consume almost 2x the memory you'd expect to, half of your memory is returned vector buffers sitting fallow. With 1.5, after ... 4? cycles the previous buffers are big enough to allocate a new buffer in. – Yakk - Adam Nevraumont Sep 14 '18 at 18:39
  • @Ivan - thanks! I suspected M$ to have done it differently, but couldn't find an online compiler to try. I have updated the answer with this information. – bobah Sep 15 '18 at 08:00
  • 1
    @Yakk-AdamNevraumont - the growth factor of two never caused me any problems. In production code one would typically make sure to pre-allocate the vector to skip first few almost-instant reallocations and slow it down (or stop from growing entirely) after some size. For theoretical discussions it's best to switch to the thread at the CS site (I pasted the link in one of my earlier comments). – bobah Sep 15 '18 at 10:52
  • 1
    @Yakk-AdamNevraumont Is that even a concern nowadays with virtual memory? – orlp Sep 15 '18 at 11:52
  • @orlp Memory is slow. Disk is slower. Having half of your reserved memory space being pages malloc is waiting to recycle is wasteful, and actually swapping them out would be criminal. And resource constrained systems still exist; I mean, I might put swap on an fast SSD with only 50 GB free. Now I can only log ~30 GB of data before my process runs OOM instead of ~60 GB. Don't waste memory, the less and more local it is the faster your code becomes. – Yakk - Adam Nevraumont Sep 16 '18 at 02:51
  • 2
    @Yakk-AdamNevraumont That's not how modern allocators with virtual memory work though. No one is swapping pages that are free, that's ridiculous. – orlp Sep 16 '18 at 06:32
  • @orp This is not true. If you are thinking about "lazy-free" mechanisms, most modern allocators (like glibc's) are still not able to utilize them properly. –  Sep 16 '18 at 23:43
7

std::vector being a contiguous container means exactly what you think it means.

However, many operations on a vector can re-locate that entire piece of memory.

One common case is when you add element to it, the vector must grow, it can re-allocate and copy all elements to another contiguous piece of memory.

nos
  • 223,662
  • 58
  • 417
  • 506
5

So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?

That's exactly how it works and why appending elements does indeed invalidate all iterators as well as memory locations when a reallocation takes place¹. This is not only valid since C++17, it has been the case ever since.

There are a couple of benefits from this approach:

  • It is very cache-friendly and hence efficient.
  • The data() method can be used to pass the underlying raw memory to APIs that work with raw pointers.
  • The cost of allocating new memory upon push_back, reserve or resize boil down to constant time, as the geometric growth amortizes over time (each time push_back is called the capacity is doubled in libc++ and libstdc++, and approx. growths by a factor of 1.5 in MSVC).
  • It allows for the most restricted iterator category, i.e., random access iterators, because classical pointer arithmetic works out well when the data is contiguously stored.
  • Move construction of a vector instance from another one is very cheap.

These implications can be considered the downside of such a memory layout:

  • All iterators and pointers to elements are invalidate upon modifications of the vector that imply a reallocation. This can lead to subtle bugs when e.g. erasing elements while iterating over the elements of a vector.
  • Operations like push_front (as std::list or std::deque provide) aren't provided (insert(vec.begin(), element) works, but is possibly expensive¹), as well as efficient merging/splicing of multiple vector instances.

¹ Thanks to @FrançoisAndrieux for pointing that out.

lubgr
  • 37,368
  • 3
  • 66
  • 117
  • I like the pros and cons mentioned, but I will keep the current accepted answer for now as it is more about answering the question than additional notes. It's a pity, I cannot accept two answers though :) – egst Sep 14 '18 at 10:51
4

In terms of the actual structure, an std::vector looks something like this in memory:

struct vector {    // Simple C struct as example (T is the type supplied by the template)
  T *begin;        // vector::begin() probably returns this value
  T *end;          // vector::end() probably returns this value
  T *end_capacity; // First non-valid address
  // Allocator state might be stored here (most allocators are stateless)
};

Relevant code snippet from the libc++ implementation as used by LLVM

Printing the raw memory contents of an std::vector:
(Don't do this if you don't know what you're doing!)

#include <iostream>
#include <vector>

struct vector {
    int *begin;
    int *end;
    int *end_capacity;
};

int main() {
    union vecunion {
        std::vector<int> stdvec;
        vector           myvec;
        ~vecunion() { /* do nothing */ }
    } vec = { std::vector<int>() };
    union veciterator {
        std::vector<int>::iterator stditer;
        int                       *myiter;
        ~veciterator() { /* do nothing */ }
    };

    vec.stdvec.push_back(1); // Add something so we don't have an empty vector

    std::cout
      << "vec.begin          = " << vec.myvec.begin << "\n"
      << "vec.end            = " << vec.myvec.end << "\n"
      << "vec.end_capacity   = " << vec.myvec.end_capacity << "\n"
      << "vec's size         = " << vec.myvec.end - vec.myvec.begin << "\n"
      << "vec's capacity     = " << vec.myvec.end_capacity - vec.myvec.begin << "\n"
      << "vector::begin()    = " << (veciterator { vec.stdvec.begin() }).myiter << "\n"
      << "vector::end()      = " << (veciterator { vec.stdvec.end()   }).myiter << "\n"
      << "vector::size()     = " << vec.stdvec.size() << "\n"
      << "vector::capacity() = " << vec.stdvec.capacity() << "\n"
      ;
}
yyny
  • 1,623
  • 18
  • 20