30

I just found out that std::vector<T>::resize "doubles" its capacity even when resizing to one element above the current size:

std::vector<int> v(50);
v.resize(51);
std::cout << v.capacity() << std::endl;

This program outputs 100 with GCC and Clang, and 75 with Visual C++. However, when I switch from resize to reserve:

std::vector<int> v(50);
v.reserve(51);
std::cout << v.capacity() << std::endl;

The output is 51 with all three compilers.

I wonder why implementations use a different expansion strategy for resize and reserve. It seems inconsistent, and I would expect the same behavior here.


I am just adding a link to a motivation for my question, where the impact on performance is reported: Why are C++ STL vectors 1000x slower when doing many reserves?


Adding a quote from C++11 Standard to clarify requirements for reserve; §23.3.6.3(2):

After reserve(), capacity() is greater or equal to the argument of reserve if reallocation happens...


Some additional thoughts: From C++11 Standard:

Complexity: The complexity is linear in the number of elements inserted plus the distance to the end of the vector.

Which, effectively, implies constant (amortized) complexity for inserting a single element at the end. However, this applies only for vector modifiers, such as push_back or insert (§23.3.6.5).

resize is not listed among modifiers. It's listed in §23.3.6.3 vector capacity section. And, there are no complexity requirements for resize.

However, in the vector overview section (§23.3.6.1), there is written:

it (vector) supports (amortized) constant time insert and erase operations at the end

The question is whether resize(size()+1) is considered to be "insertion at the end".

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93
  • Ah. Looks like you were reading the same question I was an hour or so back. I'd never run across a case where this mattered either, so I hadn't noticed it wasn't the same behaviour and assumed that it was. They can be bad things, assumptions. – user4581301 Jan 31 '18 at 08:50
  • @liliscent It's not a duplicate. Answer to that question is "because `resize` doubles the capacity, while `reserve` does not". This question asks why. (I don't think that Google would find you the other question if you asked this one.) – Daniel Langr Jan 31 '18 at 10:54
  • @DanielLangr You have reason, comment deleted. Maybe it's better to add a link to that question, that question is nice and very related to this one. – llllllllll Jan 31 '18 at 10:59
  • That other Stack Overflow question is only about 2 hours older than this question. How come? – Peter Mortensen Jan 31 '18 at 12:50
  • I think the (obvious) answer is that in practice this is what you want. – user541686 Jan 31 '18 at 19:17
  • Related question: [std::vector::reserve performance penalty](https://stackoverflow.com/q/1742859/3919155). – jotik Apr 18 '18 at 15:02

5 Answers5

19

As far as I can tell, neither resize nor reserve is required to have the demonstrated behaviour. Both are however allowed such behaviour although both could either allocate the exact amount, and both could multiply the previous allocation as far as the standard is concerned.

Each allocation strategies have their advantages. The advantage of allocating exact amount is that it has no memory overhead when the maximum allocation is known beforehand. The advantage of multiplying is that it maintains the constant amortized property when mixed with end-insertion operations.

The approach chosen by the tested implementations has the advantage that it allows both strategies when resizing. To use one strategy, one can reserve and then resize. To use the other, just resize. Of course, one has to be aware of the unspecified behaviour to take advantage of this. This advantage may or might not be the reasoning behind the choice of these implementations.

One might consider it a failure of the vector API, as specified in the standard, that expressing the intended reallocation behaviour is not possible (in a way that is guaranteed by the standard).

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 2
    This seems to be the best possible explanation for me, thanks. This effectively means that 1) `resize(n)` primarily expects that additional elements will be `push`ed`_back` beyond `n` elements, but 2) `reserve(n)` does not expect the same. That's why I talked about inconsistency here. – Daniel Langr Jan 31 '18 at 12:20
  • I would just note that even if you _reserve and then resize_, there is no guarantee that resizing won't change the capacity. – Daniel Langr Jan 31 '18 at 15:47
  • @DanielLangr I think that *is* guaranteed. Changing capacity requires reallocation. Resize is allowed to reallocate only if capacity is less than new size. Reserving guarantees at least as much capacity as was requested. Therefore the resize after reserve cannot change capacity. There's just no guarantee that the reservation before the resize had any different allocation strategy than the resize alone. – eerorika Jan 31 '18 at 15:58
  • `resize` is required to perform exponential growth. See my answer. – ecatmur Jan 31 '18 at 16:49
  • @ecatmur intersesting. What's the rule that mandates amortized constant time for incremental resizing? [[sequence.reqmts](https://timsong-cpp.github.io/cppwp/n4659/sequence.reqmts#tab:containers.sequence.optional)] lists only push, pop, emplace and access operations. – eerorika Jan 31 '18 at 17:25
  • It's under [[vector.modifiers]/2](https://timsong-cpp.github.io/cppwp/n4659/vector.modifiers#2): *linear in the number of elements inserted plus the distance to the end of the vector*. When 1 element is inserted at the end, that is (amortized) constant. – ecatmur Jan 31 '18 at 18:15
  • 2
    @ecatmur _"linear in the number of elements inserted plus the distance to the end of the vector"_: but this complexity is required only for modifiers. `resize` is not listed among modifiers. – Daniel Langr Feb 01 '18 at 09:22
  • @user2079303 _"Resize is allowed to reallocate only if capacity is less than new size."_ Can you quote from the Standard to support this? I don't see any such guarantee there. – Daniel Langr Feb 01 '18 at 09:42
  • @DanielLangr [[vector.capacity](http://eel.is/c++draft/vector.capacity)] `No reallocation shall take place during insertions that happen after a call to reserve() until the time when an insertion would make the size of the vector greater than the value of capacity().` So, the elements appended by resize are guaranteed to not reallocate when the new size is less than or equal reserved capacity. Resizing itself has no rules about possible reallocation. – eerorika Feb 01 '18 at 10:18
  • Thanks, I can see it now. It's in a remark for `reserve`. – Daniel Langr Feb 01 '18 at 10:22
9

When you resize more than there is capacity you already "demonstrate" that you don't want to reserve just the right capacity. On the other hand, if you use reserve you explicitly ask for the right capacity. If reserve would use the same strategy as resize there would be no way to reserve just the right amount.

In this sense resize without reserve is for the lazy ones or in case you don't know the exact amount to reserve. You call reserve if you know what capacity you need. That's two different scenarios.

PS: As StoryTeller pointed out, also reserve is not required to reserve the exact amount that is asked for as per the standard. Nevertheless I think my main argument still holds: resize (without reserve) and reserve are meant for different scenarios, where you either give a hint of how much you want to reserve or don't care about the actual capacity and just want to have the container sized to what you ask for.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • 11
    [There is no way to reserve an exact amount already](https://timsong-cpp.github.io/cppwp/n4659/vector.capacity#3), formally. It says greater than or equal. – StoryTeller - Unslander Monica Jan 31 '18 at 08:45
  • @StoryTeller hm, thanks, I wasnt aware of that. Nevertheless I think the standard just leaves this freedom in case an implementation cannot reserve the exact amount, but I would expect any implementation to reserve just what I ask for unless there are really good reasons to reserve more – 463035818_is_not_an_ai Jan 31 '18 at 08:48
  • 2
    Minor nit-pick: In many cases you don't know exactly the right capacity, but reserve a realistic upper bound of the size you need. E.g. if you want to process a vector generating a new one, and the processing may rarely discard or merge elements you often lazily reserve the number of elements in the original vector. – Hans Olsson Jan 31 '18 at 09:10
  • 1
    _"When you resize more than there is capacity you already "demonstrate" that you dont want to reserve just the right capacity."_ This sounds weird to me. I can resize simply to construct elements, not to demonstrate anything about capacity. – Daniel Langr Jan 31 '18 at 09:27
  • 1
    I think the bit that you're really missing in this answer, is that it's an implementation decision and that the behavior can not be depended on for all compilers. Your reasoning behind why it's their implementation decision is just opinion (which may or may not be right). – UKMonkey Jan 31 '18 at 09:32
  • @UKMonkey Yes, I know that it's "just opinion". However, all three mainstream compilers (their implementations of C++ Standard Library to be more precise) behave the very same. Therefore, there is likely some substantial reason why. – Daniel Langr Jan 31 '18 at 09:36
  • @DanielLangr Why do they have to have the same reason? Maybe one thought that it would optimize the next push_back, one thought that it was more trivial to implement, and one copied another. It's an assumption that they all have the same behavior and they all have the same reason. – UKMonkey Jan 31 '18 at 09:44
  • @DanielLangr what I meant is that if you call `resize` without calling `reserve` then you are reliying on the container managing its capacity on its own. If you wanted to have control over the capacity then you would call `reserve` before `resize`. Anyhow, I didnt expect this answer to catch that much attention. Imho the one that has been deleted gave better reasoning while mine is rather handwavy and guessing/opinion based – 463035818_is_not_an_ai Jan 31 '18 at 10:05
6

Why would you expect them to behave the same? reserve is used to preallocate space you will use later, with the expectation that the user has a decent handle on the expected final size of the container. resize is simply a normal allocation, and so it follows the normal, speed efficient, approach of geometrically increasing the container's allocated space.

Containers increase in size by multiplicative steps to reduce the number of allocations needed and thus maintain speed and reduce memory fragmentation. Doubling is the most common, but some implementations use steps of 1.5 (e.g. MSVC) which trade increased allocations for lower wasted space within each container.

But, if the user has already told the library how big they think the container will get - by causing reserve - there's no need to allocate excess space, they can instead trust the user to have called it with the correct number. It is reserve that has the unusual behaviour, not resize.

Jack Aidley
  • 19,439
  • 7
  • 43
  • 70
  • Doubling of capacity is common first of all in `push_back`, where it is perfectly reasonable. Why should `reserve` _"tell the library how big the container will get"_ and `resize` not? – Daniel Langr Jan 31 '18 at 12:07
  • 1
    Because that's literally the point of `reserve`, whereas `resize` is simply about allocating some things to use. – Jack Aidley Jan 31 '18 at 13:06
  • 1
    Also, remember that any addition to an exactly sized `resize`d would result in allocation whereas `reserve` is (usually) already bigger than the size. – Jack Aidley Jan 31 '18 at 13:07
  • That's what I wonder about. Why library implementers expect in advance that when someone `resize` a vector, he/she will then additionally append more elements? I would expect that when someone `resize`, he/she knows how big that vector will be. – Daniel Langr Jan 31 '18 at 14:27
  • 1
    They know how big the vector will be _now_; it's not about how it will be _later_. That's what `reserve` is for. – Jack Aidley Jan 31 '18 at 14:50
  • Think of it this way: `reserve` does _not_ change what is in the container, it is merely there to advise the library to get ready for it being a different, bigger, size in the future so that it can avoid needless allocations. `resize` on the other hand changes the size of the container, creating lots of default constructed elements. It is not there to provide a hint to the library about future allocation. – Jack Aidley Jan 31 '18 at 14:54
  • Also, you've got your thinking backwards. It's not `resize` that is behaving unusually it, it is `reserve`. Geometric increases in size are used _as the normal_ because they are way more efficient (see your linked question); they're disregarded for `reserve` because it is a special case which is really only about informing the library about future allocations. – Jack Aidley Jan 31 '18 at 14:57
  • I see your points, but agree only partially. When resizing, you don't know what the user knows. He may know the size _for now_ as well as _for ever_. Library implementers are obviously pessimistic here. – Daniel Langr Jan 31 '18 at 15:16
  • And, geometric increase is more efficient primarily when calling `push_back` a lot of times. Generally, using memory in an efficient way is important as well. – Daniel Langr Jan 31 '18 at 15:19
  • The question of how to resize has been extensively discussed in the computer science literature; the geometric approach is widely accepted. Remember that efficient memory usage is not simply down to how much you allocate but also the impact of that on fragmentation of memory. Frequent, incremental increases, are likely to waste memory through fragmentation. – Jack Aidley Jan 31 '18 at 15:24
  • I would rather close this discussion, since both approaches have pros and cons. Thanks for your point of view, I find it interesting :). – Daniel Langr Jan 31 '18 at 15:24
  • And, again, the way for the user to communicate that is to use the special function `reserve`. – Jack Aidley Jan 31 '18 at 15:24
  • There are pros and cons, yes, but the geometric approach is almost always _better_ for a generic library. This is not a matter of opinion. – Jack Aidley Jan 31 '18 at 15:25
  • Agree with respect to `push_back`, not `resize`. That's my opinion. – Daniel Langr Jan 31 '18 at 15:28
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/164278/discussion-between-daniel-langr-and-jack-aidley). – Daniel Langr Jan 31 '18 at 15:37
6

resize is required to follow an exponential reallocation strategy to fulfil its complexity guarantee (linear in the number of elements inserted). This can be seen by considering that resize(size() + 1) is required to have amortized constant complexity, so must follow exponential growth for the same reason that push_back (amortized constant complexity) must grow exponentially.

An implementation of reserve is permitted to follow whatever allocation strategy it likes, since its only complexity requirement is that it be linear in the number of elements present. However, if an implementation were e.g. to round up to the next power of two, this would be space-inefficient (and surprising) in the case where the user knows exactly how much memory is required, and could complicate porting if the user comes to rely on this behavior. The latitude in the Standard is better exercised in cases where there is no space inefficiency e.g. by rounding up allocations to the word size, if the allocator operates at that granularity.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • 3
    This would benefit by linking to and/or quoting the source of the complexity guarantees. – Mr.Mindor Jan 31 '18 at 20:50
  • 3
    From C++11 Standard: _Complexity: The complexity is linear in the number of elements inserted plus the distance to the end of the vector._ I agree that this implies constant (amortized) complexity for inserting a single element at the end. However, this requirement applies for **vector modifiers** (§23.3.6.5). `resize` belongs to §23.3.6.3 (vector capacity) and there are no complexity requirements for `resize`. – Daniel Langr Feb 01 '18 at 09:27
  • 1
    But, from `vector` overview section (§23.3.6.1): "_`vector` supports (amortized) constant time insert and erase operations at the end_". The question is whether `resize(size() + 1)` is considered to be _"inserting at the end"_. I am not a language lawyer, so I don't know. – Daniel Langr Feb 01 '18 at 09:30
0

reserve changes the capacity, while resize changes the size.

capacity is the number of elements that the container has currently allocated space for.

size is the number of elements in the container.

When you allocate an empty vector you get a default capacity (AKA space). The size is still 0, and when you add elements into the vector, its size increments. When the size is equal to capacity and you add more item the capacity must grow (usually double itself).

The problem with vector is that it ensures sequential memory, meaning each new allocation growth will need also a copy of the previous allocation to the new one, in case there was no space for the new allocation size in the old allocated memory area.

Here the reserve can help, if you know the max elements in the vector. When you use reserve, there will be only one allocation and no memory copy, unless you pass the reserved items.

When you specify the exact reserved count, you get the exact memory you asked. When you just add elements (even with resize), you don't say you wouldn't add more items.

SHR
  • 7,940
  • 9
  • 38
  • 57