2

I'm looking at the implementation of vector on https://www.cs.odu.edu/~zeil/cs361/sum18/Public/vectorImpl/index.html.

Under 1.3.1, it shows:

if( theSize == theCapacity ) ➀
    reserve( 2 * theCapacity + 1 ); ➁

I am wondering, why is it reserving 2 * theCapacity + 1 instead of 2 * theCapacity?

In std::vector, it just doubles the capacity when the size of the vector is equal to the capacity, and you're trying to perform an append operation. I don't quite understand the purpose of the +1 here.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
David
  • 619
  • 2
  • 8
  • 15
  • 4
    "In std::vector, it just doubles the capacity" where did you get this from? – 463035818_is_not_an_ai Sep 30 '20 at 20:11
  • @idclev463035818 I thought I had read that before. I just tested it, and it seems the capacity doubles when you're trying to `push_back` or `emplace_back` when `std::vector::size() == std::vector::capacity()` – David Sep 30 '20 at 20:12
  • 2
    @David In case when initially the capacity is equal to 0 the expression 2 * theCapacity also will be equal to 0. – Vlad from Moscow Sep 30 '20 at 20:13
  • 1
    @David I don't see anywhere mentioned in the standard that vector must double its capacity when pushing back. It is therefore implementation defined behavior and you should not rely on this. – Guillaume Racicot Sep 30 '20 at 20:13
  • i am not saying that you are wrong, but `std::vector` is not required to double its capacity. Can be that implementation do that, i never bothered to look at the code or inspect it by trial, did you? – 463035818_is_not_an_ai Sep 30 '20 at 20:14
  • @VladfromMoscow Oh yes you're right. I forgot about that edge case, but in other cases, it seems the capacity is doubling based on my testing. – David Sep 30 '20 at 20:14
  • 2
    Doubling the capacity is a [poor strategy](https://stackoverflow.com/questions/1100311/what-is-the-ideal-growth-rate-for-a-dynamically-allocated-array). – Evg Sep 30 '20 at 20:14
  • The c++ standar does not mandate any explicit reallocation policies, it just states, that there needs to be enough place to insert the new element. And that: If a reallocation happens, the reallocation is itself up to linear in the entire size at the moment of the reallocation. – g_bor Sep 30 '20 at 20:17
  • @David Looking at the [reference](https://en.cppreference.com/w/cpp/container/vector) there's nothing said about the actual behavior. I'd guess that's left to the implementation to choose a strategy. – πάντα ῥεῖ Sep 30 '20 at 20:17
  • @πάνταῥεῖ I just looked at that too and you guys are right. I could've sworn I read on SO that the doubling is indeed for what occurs. What do you mean by "left to the implementation?" We as users use `std::vector`, so where does the implementation come in? – David Sep 30 '20 at 20:18
  • The only definitive answer would be one from the author. Everyone else's answer would be duplicating his or a guess. – jxh Sep 30 '20 at 20:19
  • 2
    @David _"so where does the implementation come in?"_ Different c++ compilers and standard library implementaitons may choose different implementations for `std::vector`, as long all requirements made by the standard are met. – πάντα ῥεῖ Sep 30 '20 at 20:21
  • @πάνταῥεῖ I didn't know. I had thought all compilers used the same implementation of STL. – David Sep 30 '20 at 20:22
  • @David Compilers exist for many machines and many operating systems. The STL may have started out as a single source implementation that would work on any platform, but it has turned into the standard library, and encompasses system specific features, such as threads. So, it will necessarily be a little different across different compiler implementations, even coming from the same compiler vendor. – jxh Sep 30 '20 at 20:27
  • @πάνταῥεῖ I think I recall now where I saw the "doubling." I think in a thread where they proved that `push_back` is amortized constant. But many growth rates could lead to amortized constancy. – David Sep 30 '20 at 20:43

3 Answers3

5

The C++ standard does not specify how a std::vector needs to grow in size. Doubling is a common implementation1, possibly born out of laziness than anything else. The +1 is there too in order to obviate the issue of a zero capacity, which can occur if vector::reserve is called with a zero argument, or if your implementation constructs a zero capacity std::vector by default.


1 Personally I believe that growing the capacity using a Fibonacci sequence is perhaps more natural, although I've never come across a std::vector implementation which does that.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 1
    Yeah, Fibonacci! – jxh Sep 30 '20 at 20:22
  • I guess it's because memory is assumed to be plenty these days, so doubling instead cuts down on potential memory allocations and moving the elements. You should always `reserve()` yourself though anyway if you use large vectors. – Nikos C. Sep 30 '20 at 20:28
  • @NikosC.: Doubling is still an odd choice for me though. Multiplying by the golden ratio would be more sensible, surely? – Bathsheba Sep 30 '20 at 20:29
  • It does sound more sensible indeed. But it always seemed to me that if there's one thing library implementors are scared about the most, it's allocation. So I suppose they always tend to pick the strategy that minimizes them, even if it's by a small margin. – Nikos C. Sep 30 '20 at 20:38
  • Or a table lookup. It only takes 94 entries or so before a `uint64_t` is maxed out. – jxh Sep 30 '20 at 20:39
  • @NikosC. `So I suppose they always tend to pick the strategy that minimizes [allocations]` They could reduce number of allocations significantly by growing the capacity by a factor of 1024. But they don't do that because they cannot make that choice based solely on minimising number of allocations. – eerorika Sep 30 '20 at 20:58
3

Consider that std::vector::push_back is required to have amortized constant complexity (see eg here).

Reallocating all elements has linear cost.

Now imagine you would increase capacity to have space for only one additional element. This would mean that push_back has linear complexity. The only way to get amortized constant is to reduce frequency of reallocations the more elements are in the vector.

Doubling the capacity each time the vector is full is one way to achieve that. However, implementations are free to choose different allocation strategies as long as they are in accordance with the specification and nowhere the standard says that capacity has to double.

Having said this, capacity*2+1 is probably to avoid branching for capacity== 0 each time capacity changes.

PS: Actually the article already has a teaser below that code: "We’ll see later what this does to the big-O complexity for push_back()." and later they explore in detail how their allocation strategy ensures amortized constant complexity for push_back.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
2

I am pretty sure this is due to the initial value of theCapacity. While this is not shown on the page you linked, we can assume that theCapacity is initialized to zero. This makes sense, as a vector that remains untouched should't allocate memory. Hence, on the first push_back, the + 1 makes sure memory for at least one element is allocated.

lubgr
  • 37,368
  • 3
  • 66
  • 117
  • 1
    Also, I don't think the standard says anything about `vector` being required to double the capacity. Implementations are free to increase it in whatever way they think is best. – Nikos C. Sep 30 '20 at 20:15
  • So it seems after the first push operation, the capacity becomes 1. After a second push operation the capacity becomes 3? In `std::vector` the second push would make the capacity 2. – David Sep 30 '20 at 20:16
  • 2
    ... which again depends on the vector implementation you are looking at. – lubgr Sep 30 '20 at 20:17
  • 2
    @David There's many different implementations of `std::vector`. Microsoft has their own, for example, GCC have their own (libstdc++), and so does clang (libc++). Some implementations might double it, some others might be using a different strategy. – Nikos C. Sep 30 '20 at 20:17
  • @NikosC. Ahhhh I didn't know that. I was actually confused what you guys were referring to in terms of implementation. I had thought that the implementation for each compiler is the same. In my case, I'm using clang. – David Sep 30 '20 at 20:19
  • @David Yeah, the C++ standard does not actually provide an implementation for the standard library at all. It just documents the behavior the library must have. This is one of the things that is not documented (meaning, not standardized), so different C++ library implementations are free to implement this however they want. (As a side note, clang on Linux actually uses the GCC implementation of the library. To use the clang implementation, you need to compile with `clang++ -stdlib=libc++` and make sure libc++ is installed on the system. However, I think libc++ also doubles the capacity.) – Nikos C. Sep 30 '20 at 20:23