19

I'm new to C++ and I'm using the vector class on my project. I found it quite useful because I can have an array that automatically reallocates whenever it is necessary (ie, if I want to push_back an item and the vector has reached it's maximum capacity, it reallocates itself asking more memory space to the OS), so access to an element of the vector is very quick (it's not like a list, that to reach the "n-th" element I must go through the "n" first elements).

I found this question very useful, because their answers explained perfectly how the "memory allocator" works when I want to store my vector on the heap/stack:

[1] vector<Type> vect;
[2] vector<Type> *vect = new vector<Type>;
[3] vector<Type*> vect;

However, a doubt is bugging me for a while, and I can't find its answer: Whenever I construct a vector and begin pushing a lot of items in, it would reach a moment when the vector would be full, so to continue growing it would need to reallocate, copy itself to a new location and then continue pushing_back items (obviously, this reallocation it's hidden on the implementation of the class, so it is completely transparent to me)

Fine, if I have created the vector on the heap [2], I have no troubles imagining what may be happening: class vector calls malloc, acquires new space and then copy itself into the new memory and finally deletes the old memory calling free.

However, a veil hides what is happening when I construct a vector on the stack [1]: What does it happens when the vector must reallocate? AFAIK, whenever on C/C++ you enter a new function, the computer would look at the declaration of variables and then expand the stack to get the necessary space to put these variables, but you can't allocate more space on the stack when the function is already running. How does the class vector solve this problem?

Community
  • 1
  • 1
Karl
  • 374
  • 1
  • 5
  • 15
  • 2
    Apparently, the answers in that question didn't explain it perfectly at all, because you got a completly wrong idea. – R. Martinho Fernandes Jun 25 '13 at 14:28
  • The vector allocates its data somewhere that *can* grow at runtime. The size of a vector object itself remains fixed, because it keeps a fixed size handle to that dynamically allocated data. – juanchopanza Jun 25 '13 at 14:28
  • 1
    Ok, with the answers below, I understand the data strcture of a vector object: like @juanchopanza said below, it is a set of pointers that defines the size of an array located on heap with objects stored on it. Since this array is on heap, it may be reallocated if more space is needed. BTW, sorry for english grammar! I hope to improve it with practice! – Karl Jun 25 '13 at 17:13
  • It's a common misconception to assume that just because you say `std::vector<...> myvect;` vs. `std::vector<...> *myvect = new std::vector<...>;` you'd end up - for the contents (!) - with _stack allocation_ on the former but _heap allocation_ on the latter. Not so; while for the `new ...` case, heap is pretty much guaranteed, the _internal implementation_ of the container type decides what happens when you instantiate it locally. Only certain containers (i.e. `std::array`) will _embed_ their contents. `std::vector` does not. – FrankH. Jun 30 '13 at 08:35

7 Answers7

61

You wrote

[...] copy itself to a new location [...]

which is not the way a vector works. The vector data is copied to a new location, not the vector itself.

My answer should give you an idea of how a vector is designed.

The common std::vector layout*

Note: The std::allocator is actually likely to be an empty class and std::vector will probably not contain an instance of this class. This may not be true for an arbitrary allocator.

std::vector layout

In most implementations it consists of three pointers where

  • begin points to the start of the data memory of the vector on the heap (always on the heap if not nullptr)
  • end points one memory location past the last element of the vector data -> size() == end-begin
  • capacity points on memory location past the last element of the vector memory -> capacity() == capacity-begin

A vector on the stack

We declare a variable of type std::vector<T,A> where T is any type and A is an allocator type for T (i.e. std::allocator<T>).

std::vector<T, A> vect1;

How does this look like in memory?

std::vector on the stack

As we see: Nothing happens on the heap but the variable occupies the memory that is necessary for all of its members on the stack. There it is and it will stay there until vect1 goes out of scope, since vect1 is just an object like any other object of type double, int or whatever. It will sit there on its stack position and wait to get destroyed, regardless of how much memory it handles itself on the heap.

The pointers of vect1 do not point anywhere, since the vector is empty.

A vector on the heap

Now we need a pointer to a vector and use some dynamic heap allocation to create the vector.

std::vector<T, A> * vp = new std::vector<T, A>;

Let's again look at the memory.

std::vector on the heap

We have our vp variable on the stack and our vector is on the heap now. Again the vector itself will not move on the heap since its size is constant. Only the pointers (begin, end, capacity) will move to follow the data position in memory if a reallocation takes place. Let's have a look at that.

Pushing elements to a vector

Now we can start pushing elements to a vector. Let's look at vect1.

T a;
vect1.push_back(a);

std::vector after single push_back

The variable vect1 is still where it has been but memory on the heap was allocated to contain one element of T.

What happens if we add one further element?

vect1.push_back(a);

std::vector after second push

  • The space allocated on the heap for the data elements will not be enough (since it is only one memory positions, yet).
  • A new memory block will be allocated for two elements
  • The first element will be copied/moved to the new storage.
  • The old memory will be deallocated.

We see: The new memory location is different.

To have additional insight let's look at the situation if we destroy the last element.

vect1.pop_back();

The memory allocated won't change but the last element will have its destructor called and the end pointer moves one position down.

std::vector after 2x push and 1x pop

As you can see: capacity() == capacity-begin == 2 while size() == end-begin == 1

Pixelchemist
  • 24,090
  • 7
  • 47
  • 71
  • 1
    Well... I guess my first thoughts we're wrong about the implementation: Since the vector's data have a fixed size, no matter how big is the vector, it can be stored on the stack without any problems: The *allocator* will ask more heap size and the pointers to data will change. Now I understand why it is not so intuitive to use GDB to see what data is stored on a vector. Thanks @Pixelchemist! – Karl Jun 25 '13 at 17:05
  • @Karl: use a `std::array` if you want a stack-allocated "vector". The disadvantage is ... it cannot be resized. Precisely _because_ `std:array` (both the container and its contents) resides on the stack, and "shuffling around stack" isn't possible. – FrankH. Jun 30 '13 at 08:38
  • Are you sure about that the "Allocator object" takes space on the stack? `sizeof(std::vector)` return 12 and 24 for 32-bit and 64-bit architectures respectively. (x86_64, Ubuntu 13.04, gcc 4.7.3) – stanm Apr 22 '14 at 15:35
  • @stanm: The `std::allocator` is likely to be empty but you can have a user defined allocator requiring an instance. I.e. in MSVS a `std::vector` contains an instance of the allocator if it is non-empty. – Pixelchemist Apr 29 '14 at 13:35
  • @Pixelchemist: Oh, I see, std::vector is essentially a different class than std::vector. And of course you would need to save that allocator somewhere. Thanks. – stanm Apr 29 '14 at 14:23
7

The vector object may well be instiantiated on the stack but the data within the vector will be on the heap.

(The trivial class class foo {int* data;}; has this characteristic)

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • In fact your trivial class has a pointer on the stack if a foo instance is built on the stack. A characteristic that is common to all pointer-type variable declarations. Even `int * data;` shows it. – Pixelchemist Jun 26 '13 at 02:56
  • Yes you're correct but the point I'm making is that the memory to which data points is (probably) heap-allocated. Loosely speaking, that's how stl vectors can work. – Bathsheba Jun 26 '13 at 05:54
6

The way you construct your vector (stack or heap) doesn't matter for this.

See the documentation for std::vector

Internally, vectors use a dynamically allocated array to store their elements. This array may need to be reallocated in order to grow in size when new elements are inserted, which implies allocating a new array and moving all elements to it.

When a vector "grows", the vector object doesn't grow, only the internal dynamic array changes.

As for its implementation, you can look at GCC's vector implementation.

To keep it simple, it declares vector as a class with one protected member, of type _Vector_impl.

As you can see, it is declared as a structure that contains three pointers :

  • One that points at the beginning of the storage (and the beginning of the data)
  • One that points at the end of the data
  • One for the end of the storage
JBL
  • 12,588
  • 4
  • 53
  • 84
  • Yes, I understand that the **content** of the vector is stored on the heap (on every case I put on my question), but... What **is** the *internal data* of the vector object that, on the first case, is instantiated on the stack? I figure it may be an "array of pointer" (like a classic array on C) that whenever you push_back an item, the vector class allocates new memory on the heap and then adds a pointer to this array of pointers – Karl Jun 25 '13 at 14:40
  • 1
    @Karl it could as simple as a pointer to the data, plus data members for size and capacity. – juanchopanza Jun 25 '13 at 14:42
  • I mean, if you construct the vector on the heap using ->std::vector *vector;<- Everythin is OK, this "array of pointer" will reallocate when the space is over.... but what does it happens when you construct your vector on the stack? this array of vectors would ve created on the stack instead of the heap, right? **Edit:** Sure @juanchopanza, i didn't thought about your example... it has sense – Karl Jun 25 '13 at 14:43
  • 1
    The "contents" (that is, the _actual_ array backing the vector, aka `std::vector::data()`) are always heap-allocated. For stack-allocated vector-like containers, you need to use `std::array` - it's got the same iterators and accessors as `std::vector` _sans_ the ability to resize, and it's templated by the size. – FrankH. Jun 25 '13 at 14:49
  • i lied - it is 12 bytes - I take it back – im so confused Jun 25 '13 at 14:50
  • @Karl If you construct your vector on the stack, i.e. as in `std::vector myVector;`, the internal data is still on the heap. I don't have the implementation details (which may vary anyway) on top of my head, but you can expect the vector to have a member that is `T* data`, which is allocated at construction using the following statement : `data = new T[DEFAULT_SIZE]`. – JBL Jun 25 '13 at 14:51
  • 2
    @Karl: A common implementation of vector has as only members (non-debug version) three pointers `T *begin,*end,*capacity;` (and possibly an allocator if it cannot be optimized away). The vector does not contain pointers to all of the elements, but a single pointer to the first element and two *sentries* to determine where the *valid* elements stop (`end`) and where the capacity ends (`capacity`). Adding more elements will require growing, which in turn will reset the three pointers to different values, but the structure in the stack (i.e. `v` in `std::vector v;`) won't change it size – David Rodríguez - dribeas Jun 25 '13 at 14:51
  • @AK4749: You mean 12 bytes on a 32bit architecture... 3 pointers – David Rodríguez - dribeas Jun 25 '13 at 14:53
  • @DavidRodríguez-dribeas haha yep, realized that just as you posted your comment, thanks! – im so confused Jun 25 '13 at 14:53
  • @juanchopanza haha I lied sorry, I deleted my comment so it wouldn't further confuse people. it is 3 pointers as David stated http://msdn.microsoft.com/en-us/library/vstudio/hh567368.aspx – im so confused Jun 25 '13 at 14:54
  • @juanchopanza: Your comment should not be confusing. The choice of three pointers (plus allocator if needed) or a pointer and two `size_type` members to track `size()` and `capacity()` is an implementation detail. It should not matter, the important thing is that you do need to track three pieces of information. On allocators: the default allocator and many/most C++03 compliant allocators were basically stateless and the space can be optimized away (empty base optimization) but conceptually (and practically with statefull allocators) the container must hold an allocator. – David Rodríguez - dribeas Jun 25 '13 at 15:00
4

You are essentially asking about the implementation details of a vector. The C++ Standard does not define how a vector is to be implemented -- it only defines what a vector should do and what operations are required to be implemented. Nobody can tell you with 100% accuracy what happens when a vector is reallocated, because every compiler is theoretically different.

That being said, it is not difficult to understand how a vector is typically implemented. The vector itself is simply a data structure which has a pointer to the actual data stored "in" the vector. Something along these lines:

template <typename Val> class vector
{
public:
  void push_back (const Val& val);
private:
  Val* mData;
}

The above is obviously psudocode, but you get the idea. When a vector is allocated on the stack (or on the heap):

vector<int> v;
v.push_back (42);

The memory might end up looking like this:

+=======+        
| v     |
+=======+       +=======+
| mData | --->  |  42   |
+=======+       +=======+

When you push_back to a full vector, the data will be reallocated:

+=======+        
| v     |
+=======+       +=======+
| mData | --->  |  42   |
+=======+       +-------+
                |  43   |
                +-------+
                |  44   |
                +-------+
                |  45   |
                +=======+

...and the vector's pointer to the new data will now point there.

Oktalist
  • 14,336
  • 3
  • 43
  • 63
John Dibling
  • 99,718
  • 31
  • 186
  • 324
  • I have some doubt, If we push_back an rvalue in the vector which would invoke move constructor. What would happen if the capacity is exhausted ? I mean, move usually does a static_cast and marks the object as rvalue but in this case, will the vector allocate memory and copy the pointers ? – unbesiegbar Jun 17 '22 at 16:33
1

you also can reserve to reserve anticipated size,

vect.reserve(10000);

this will reserve 10000 object space of the type used

aah134
  • 860
  • 12
  • 25
1

A vector is not an array of elements, or the memory used to store such.

A vector manages an array of elements, the memory for which it allocates, de-allocates, shrinks and grows as required.

Your choice of how you allocate the vector itself has no connection to the vector's own decisions about how and where to allocate the memory it manages for you.

I don't want to discourage your interest in how the vector works internally (it's both interesting and useful), but ... the whole point of writing classes and documenting them is that you only need to understand the interface, not the implementation.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • Thank you! Your answer complemented @juanchopanza's comment. And yes, I agree that the whole point of writing classes and documenting is to focus on the interface. However, I think one must have at least a vague idea about how a class is implemented to being able to write good code and predict possible errors: There are no goblins doing the dirty work of memory allocation! – Karl Jun 25 '13 at 16:54
0

If you make some growing vector, from empty, to some size, will be using doubling strategy, and most of the time reallocation, in this example using integer from 1 to 10000, and clang (std=2a -O3) will get this, this is just for fun, showing the importance of using reserve. vector::begin() point to beginning of actual array, and vector::capacity show actual capacity. In the other hand, iterator invalidated is showed.

std::vector<int> my_vec;
auto it=my_vec.begin();
for (int i=0;i<10000;++i) {
    auto cap=my_vec.capacity();
    my_vec.push_back(i);
    if(it!=my_vec.begin()) {
        std::cout<<"it!=my_vec.begin() :";
        it=my_vec.begin();
    }
    if(cap!=my_vec.capacity())std::cout<<my_vec.capacity()<<'\n';
}

This produce the following result:

it!=my_vec.begin() :1
it!=my_vec.begin() :2
it!=my_vec.begin() :4
it!=my_vec.begin() :8
it!=my_vec.begin() :16
it!=my_vec.begin() :32
it!=my_vec.begin() :64
it!=my_vec.begin() :128
it!=my_vec.begin() :256
it!=my_vec.begin() :512
it!=my_vec.begin() :1024
it!=my_vec.begin() :2048
it!=my_vec.begin() :4096
it!=my_vec.begin() :8192
it!=my_vec.begin() :16384
JeremyW
  • 5,157
  • 6
  • 29
  • 30