Why Vector's size() and capacity() is different after push_back()

Question

I just starting to learning vectors and little confused about size() and capacity() I know little about both of them. But why in this program both are different? even array(10) is making room for 10 elements and initializing with 0.

Before adding array.push_back(5)

So array.size(); is 10 that is ok.

So array.capacity(); is 10 that is ok.

After adding array.push_back(5)

So array.size(); is 11 that is ok (already 10 time 0 is added and then push_back add one more element 5 ).

So array.capacity(); is 15 Why? ( is it reserving 5 blocks for one int? ).

#include <iostream>
#include <vector>
int main(){
    std::vector<int> array(10); // make room for 10 elements and initialize with 0
    array.reserve(10);    // make room for 10 elements
    array.push_back(5);
    std::cout << array.size() << std::endl;
    std::cout << array.capacity() << std::endl;
    return 0;
}

Just a note, you shouldn't use `array` as a variable name; especially when the type isn't an array. I realize this is just example code, but still probably just a good idea. :) — erip, Jan 15 '16 at 12:55
Don't use `std::endl` unless you need the extra stuff that it does. `'\n'` starts a new line. — Pete Becker, Jan 15 '16 at 15:38
@PeteBecker true, but in this case, where the output is diagnostic, we tend to want the implicit `flush` that `endl` has, otherwise messages won't necessarily appear until later and/or in the wrong order. (at least this has been my experience) — underscore_d, Jan 15 '16 at 16:39
@underscore_d - maybe, but that doesn't apply to the code here. — Pete Becker, Jan 15 '16 at 17:05
@PeteBecker I really like extra comments like yours one. Thanks it clear my confusion more about `std::endl` and `\n` :) and force me to learn more about buffer and why it's important can you help me or link me related to buffer in c++? — Asif Mushtaq, Jan 16 '16 at 08:13
@erip thanks to point out. Just to confirm because array is a built-in STL? — Asif Mushtaq, Jan 16 '16 at 08:14
Yes, and it's misrepresentatitive because it's *not an array*. — erip, Jan 16 '16 at 13:42

TemplateRex · Accepted Answer · 2016-01-15T13:52:31.453

17

The Standard mandates that std::vector<T>::push_back() has amortized O(1) complexity. This means that the expansion has to be geometrically, say doubling the amount of storage each time it has been filled.

Simple example: sequentially push_back 32 ints into a std::vector<int>. You will store all of them once, and also do 31 copies if you double the capacity each time it runs out. Why 31? Before storing the 2nd element, you copy the 1st; before storing the 3rd, you copy elements 1-2, before storing the 5th, you copy 1-4, etc. So you copy 1 + 2 + 4 + 8 + 16 = 31 times, with 32 stores.

Doing the formal analysis shows that you get O(N) stores and copies for N elements. This means amortized O(1) complexity per push_back (often only a store without a copy, sometimes a store and a sequence of copies).

Because of this expansion strategy, you will have size() < capacity() most of the time. Lookup shrink_to_fit and reserve to learn how to control a vector's capacity in a more fine-grained manner.

Note: with geometrical growth rate, any factor larger than 1 will do, and there have been some studies claiming that 1.5 gives better performance because of less wasted memory (because at some point the reallocated memory can overwrite the old memory).

edited Jan 15 '16 at 13:52

answered Jan 15 '16 at 12:33

TemplateRex

69,038
19
164
304

1

It is probably worth mentioning that expanding the space by a factor of 1.5 (which is what the OPs library appears to be doing) still gives amortized constant time capacity, but (can) do more copies - but have less wasted memory. – Martin Bonner supports Monica Jan 15 '16 at 13:40
@MartinBonner thanks, updated. Though that claim has never really been substantiated AFAIK. – TemplateRex Jan 15 '16 at 13:44
Which claim? Still amortized constant? or reduced memory usage? (They both seem obvious to me.) – Martin Bonner supports Monica Jan 15 '16 at 13:47
@MartinBonner reduced memory OK, but better *performance* because of that, has not been substantiated. – TemplateRex Jan 15 '16 at 13:52
Hmm. I thought it was an approach to prevent out-of-memory errors, rather than improve performance per-se. – Martin Bonner supports Monica Jan 15 '16 at 15:06
@MartinBonner see this [long Reddit thread](https://www.reddit.com/r/programming/comments/2ezy59/facebooks_stdvector_optimization/) for further info (it has popped recently as well, but can't find it) – TemplateRex Jan 15 '16 at 15:42

score 10 · Answer 2 · answered Jan 15 '16 at 12:30

10

It is for efficiency so that it does not have to expand the underlying data structure each time you add an element. i.e. not having to call delete/new each time.

answered Jan 15 '16 at 12:30

Ed Heal

59,252
17
87
127

Also for avoiding copying all elements on expansion many times, I guess. – MikeCAT Jan 15 '16 at 12:30
So we can't tell the exact capacity() of any vector? – Asif Mushtaq Jan 15 '16 at 12:33
1

@UnKnown yes, we can: `capacity()` returns the number of elements that can be contained before a new reallocation. – TemplateRex Jan 15 '16 at 12:51

Aracthor · Answer 3 · 2016-01-15T12:40:45.117

7

The std::vector::capacity is not its actual size (which is returned by size()), but the size of the actual internal allocated size.

In other terms, it is the size that it can reach before another re-allocation is needed.

It doesn't increase by 1 each time you do a push_back in order to not call a new reallocation (which is a heavy call) on each inserted element. It reserves more, because it doesn't know if you won't do some other push_back just after, and in this case, it won't have to change allocated memory size for the 4 next elements.

Here, 4 next elements is a compromise between 1, that would optimize memory allocation at maximum but would risk another reallocation soon, and a huge number, that would allow you to make many push_back quickly but maybe reserve a lot of memory for nothing.

Note: if you want to specify a capacity yourself (if you know your vector maximum size for instance), you can do it with reserve member function.

edited Jan 15 '16 at 12:40

answered Jan 15 '16 at 12:30

Aracthor

5,757
6
31
59

I already used reserve(); but I'm confused why capacity jumped to 15 even I just added/push_back just one integer? – Asif Mushtaq Jan 15 '16 at 12:35
@UnKnown: You didn't reserve space for 11 elements, though. – Kerrek SB Jan 15 '16 at 12:35
@UnKnown Because it allocate more than one element in order to not reallocate for the few next ones and limit realloc number. The more it resize, the more it resize large in order to restrain `reserve` intern calls. – Aracthor Jan 15 '16 at 12:37
So in this case we can't tell the exact capacity() of vector? – Asif Mushtaq Jan 15 '16 at 12:39
1

@UnKnown If you know how the memory management of your vector works (that can depend of the system, compiler etc), you can predict it. But it is almost always a compromise between allocated memory size and reallocations number. (Which represent respectively used RAM and used CPU) – Aracthor Jan 15 '16 at 12:42
@Unknown: You can't reliable *predict* what capacity will be if you just do `push_back` - but you can tell afterwards, and if you really care, you can set it with `reserve`. – Martin Bonner supports Monica Jan 15 '16 at 13:41
If capacity was increased by 1 only, then every single push_back () would have to increase the capacity, which is an operation that is somewhere between "expensive" and "very expensive". By changing the capacity to 15, the next four calls to push_back () are much cheaper. – gnasher729 Jan 15 '16 at 15:44

score 3 · Answer 4 · answered Jan 15 '16 at 12:35

Using

std::vector<int> array(10); // make room for 10 elements and initialize with 0

You actually filled all the ten spaces with zeros. Adding ad additional element will cause the capacity to be expanded thanks for efficiency. In your case it is useless to call the function reserve because you have instantiated the same number of elements.

check this and this link

score 2 · Answer 5 · edited May 23 '17 at 12:10

I think the following question can give you more detail about the capacity of a vector.

About Vectors growth

I will reference the answer in above question.

The growth strategy of capacity is required to meet the amortized constant time requirement for the push_back operation. Then, the strategy is designed to have a exponential growth generally when the space is lack. In short, the size of vector indicate the number of elements now, while the captacity show its ability used to push_back in future.

Shafi · Answer 6 · 2016-01-15T12:40:29.373

1

Size() returns how many values you have in the vector.

And capacity() returns size of allocated storage capacity means how many values it can hold now.

edited Jan 15 '16 at 12:40

answered Jan 15 '16 at 12:31

Shafi

1,850
3
22
44

1

why capacity changed after calling push_back()? – Asif Mushtaq Jan 15 '16 at 12:32
To accomodate the upcoming elements in that vector. – Shafi Jan 15 '16 at 12:34
So we can't tell the exact capacity()? – Asif Mushtaq Jan 15 '16 at 12:35
@UnKnown `capacity()` *is* the exact capacity. It grows when necessary, but in larger "steps" than the size. You can't tell what the capacity will be in the future, but you can always tell what it is right now. – molbdnilo Jan 15 '16 at 12:41
Because the data structure allocates memory in runtime. – Shafi Jan 15 '16 at 12:42

Why Vector's size() and capacity() is different after push_back()

6 Answers6