1

According to the following test, it seems that a std::vector<int> increases its capacity in this way:

  • it happens when we push_back() and the capacity is already full (i.e. v.size() == v.capacity()), it has to be noted that it doesn't happen a little bit before

  • the capacity increases to 1.5 times the previous capacity

Question: why this 1.5 factor? Is it implementation-dependent? Is it optimal?

Also, is there a way to analyze, in this code, when exactly a reallocation happens? (sometimes maybe the capacity can be increased without moving the first part of the array)


vector<int> v;
int previouscapacity = 0;
for (unsigned int i = 0; i < 1000000; i++)
{
    v.push_back(i);
    if (v.capacity() != previouscapacity)
    {
        wcout << L"new capacity: " << v.capacity() << L" new size: " << v.size() << L" ratio: " << ((float) v.capacity()) / previouscapacity << '\n';
        previouscapacity = v.capacity();
    }
}

new capacity: 1 new size: 1 ratio: 1.#INF
new capacity: 2 new size: 2 ratio: 2
new capacity: 3 new size: 3 ratio: 1.5
new capacity: 4 new size: 4 ratio: 1.33333
new capacity: 6 new size: 5 ratio: 1.5
new capacity: 9 new size: 7 ratio: 1.5
new capacity: 13 new size: 10 ratio: 1.44444
new capacity: 19 new size: 14 ratio: 1.46154
new capacity: 28 new size: 20 ratio: 1.47368
new capacity: 42 new size: 29 ratio: 1.5
new capacity: 63 new size: 43 ratio: 1.5
new capacity: 94 new size: 64 ratio: 1.49206
new capacity: 141 new size: 95 ratio: 1.5
new capacity: 211 new size: 142 ratio: 1.49645
...
new capacity: 466609 new size: 311074 ratio: 1.5
new capacity: 699913 new size: 466610 ratio: 1.5
new capacity: 1049869 new size: 699914 ratio: 1.5


Note: I'm using VC++ 2013

Basj
  • 41,386
  • 99
  • 383
  • 673
  • 5
    The growth rate is implementation dependent AFAIK. It's popular to double the size. You can ask for `capacity` to know how many elements will fit before a realloc is needed. – AndyG Jul 30 '17 at 18:11
  • 1
    As an added note, if you know how many elements there are going to be, use `reserve()` on the vector. – tambre Jul 30 '17 at 18:12
  • See https://stackoverflow.com/q/1100311/72178. – ks1322 Jul 30 '17 at 18:13
  • Don't look at that ratio which you're printing. It's because there's some rounding-off taking place after the capacity is increased by 1.5 times. For example 3*1.5=4.5 but the new capacity remains 4. And also, it's always that the vector is copied using copy constructor during expansion. That's why O(1) of addition is amortized. – Aakash Verma Jul 30 '17 at 18:16
  • 1
    @AakashVerma the ratio-rounding is not important. It's ~1.5 here, when size is big enough. – Basj Jul 30 '17 at 18:28
  • @tambre: yes I know the solution to avoid too many reallocs is `reserve()` – Basj Jul 30 '17 at 18:28
  • @AndyG ok so each time the capacity change (each time `v.capacity() != previouscapacity`), it means that a moving / realloc happens? Are you sure? I thought that for example, the `capacity` could be 100, and there could be empty free 16 MB of data after the vector. Then if capacity increases to 150, the first 100 elements could stay in place (no realloc), but just 50 contiguous elements would be added. Is this possible? – Basj Jul 30 '17 at 18:31
  • @Basj: You're mixing up `size` (the number of elements in the vector) and `capacity` (the size the of allocated buffer). – AndyG Jul 30 '17 at 19:10
  • Just run your example code in a freebsd/clang platform and it uses a _by two_ multiplying factor. – Luis Colorado Jul 31 '17 at 09:38

4 Answers4

5

Like answer to the linked question What is the ideal growth rate for a dynamically allocated array? shows, always doubling the allocated size has the problem that the free'd memory will always be just too small for the next allocation. The vector will "wander" through the heap, leaving lots of fragments behind.

The "optimal" reallocation size, that maximizes reuse, turns out to be the golden ratio which is 1.61803...

However, 1.5 is a lot easier to calculate as capacity() + capacity() / 2 and is close enough in practice. That makes it a popular choice for existing implementations.

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
  • 2
    I disagree. The golden ratio is far more easy to calculate. You only have to take two successive elements of the fibonacci sequence, and add them to get the next (or subtract to get the previous) and you'll get the golden ratio well soon. Only sums and no multiplications. I've even used it to implement this, precisely(when to grow the capacity of a vector). If you divide two adjacent fibonacci elements you get an aproximation to `phi`. – Luis Colorado Jul 31 '17 at 09:26
3

Also, is there a way to analyze, in this code, when exactly a reallocation happens? it happens when you push back and there is not enough capacity. There is nothing to analyze.

why this 1.5 factor? Is it implementation-dependent? Is it optimal?

The growth factor must be exponential. It need not be uniform. 2 was popular, as it leads to "on average each element is copied once".

The problem with 2 was that a travelling vector with reallocations would never leave enough space for the next reallocation to fit, as 1+2+4+8 < 32, no matter how long the series. With 1.5, 1+1.5+2.25+3.3 > 7.7, so in the long term previously freed space is sufficient to fit newly needed space.

If your growing vector is the main consumer of memory, under 2x scaling half of memory will be discarded leftover old vector buffers. Under 1.5 the discarded buffers grow, and eventually become big enough to be recycled for a new allocation (assumeing they are contiguous).

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • I do not really understand. I thought such problems would be minimized with `realloc()`. If there is space after the current block, it will just be enlarged without copying. If there is no space, `realloc` call would of course move the datablock. However then the free'd blocks (if this happens multiple times) would not be contiguous anyway. Then it doesnt matter whether 1.5 or 2 is the factor for buffer enlargement... Memory is fragmented anyways. – Andreas H. Jul 30 '17 at 20:22
  • @andreas vectors do not realloc in general. I believe they could with trivially copyable types and the standard allocator and proof new is not overloaded under the as-if rule, but I am unaware of vectors that do. A realloc API in C++ is complicated by the fact most types are bot trivial to copy; you actually need "expand if possible, otherwise fail" as copying bytes is not sufficient. – Yakk - Adam Nevraumont Jul 30 '17 at 20:28
  • That does not really make sense. If realloc succeeds nothing is to be done, just the new element needs to be constructed (but that it needs to be done anyways). In the other case there is no difference to the free and malloc scheme discussed here, – Andreas H. Jul 30 '17 at 20:40
  • After a bit of googling I found this: https://stackoverflow.com/questions/3105001/why-is-there-no-reallocation-functionality-in-c-allocators. It seems that there is no `realloc` function in c++ allocators. Some call it design flaw, and I tend to agree here. – Andreas H. Jul 30 '17 at 20:41
  • @andreas realloc, the function, either extends the allocation *or* bytewise copies the allocation into new buffer and deallocates the old one. When you said realloc, I presumed you meant realloc, not "realloc-like". – Yakk - Adam Nevraumont Jul 30 '17 at 22:35
1

The factor 1.5 is decided by Microsoft's STL implementation team. As per them a factor of 1.5 to 2 is the best possible solution for speed and memory optimization.
You can check the explanation here, starting at the 15th minute.

tambre
  • 4,625
  • 4
  • 42
  • 55
  • Increasing memory capacity or timeout, or some other things by 1.5 is not Microsoft's invention. O.K. now Basj can watch lecture about that... Is there answer for "is there a way to analyze, in this code, when exactly a reallocation happens?" – VolAnd Jul 30 '17 at 18:43
0

I think an important aspect to the answer of why the factor is 1.5 is preferred over 2.0, is that the standard c++ allocator model does not support a realloc function:

Realloc- in Standard allocator

If it would exist, and it would be used in std::vector, the mentioned memory fragmentation problem would be minimized in all cases where the factor 1.5 is better compared to e.g. factor 2.0 capacity increase.

The factor of 1.5 (or more precisely the golden ratio) is just optimum when the buffer is always relocated on every buffer size increase (as the other answers explain).

With realloc and if there is space after the current memory block, it would just be enlarged without copying. If there is no space, realloc call would of course move the datablock. However then the free'd blocks (if this happens multiple times) would not be contiguous anyway. Then it doesnt matter whether 1.5 or 2 is the factor for buffer enlargement... Memory is fragmented anyways.

So a realloc approach in std::vector would help wherever the factor 1.5 has the advantage over 2.0 and would save the data copy.

Some call the missing realloc functionality in c++ allocators a design flaw (see link), and I tend to agree here.

Why is there no reallocation functionality in C++ allocators?

Andreas H.
  • 5,557
  • 23
  • 32