4

I'm struggling with the correct mental model and understanding of std::vector.

What I thought I knew

When you create a vector of type T and then reserve N elements for the vector, the compiler basically finds and reserves a contiguous block of memory that is N * sizeof(T) bytes. For example,

// Initialize a vector of int
std::vector<int> intvec;

// Reserve contigious block of 4 4-byte chunks of memory
intvec.reserve(4);  // [ | | | ]

// Filling in the memory chunks has obvious behavior:
intvec.push_back(1);  // [1| | | ]
intvec.push_back(2);  // [1|2| | ]

Then we can access any element in random access time because, if we ask for the kth element of the vector, we simply start at the memory address of the start of the vector and then "jump" k * sizeof(T) bytes to get to the kth element.

Custom Objects

My mental model breaks down for custom objects of unknown/varying size. For example,

class Foo {

public:
    Foo() = default;
    Foo(std::vector<int> vec): _vec{vec} {}

private:
    std::vector<int> _vec;
};

int main() {

    // Initialize a vector Foo
    std::vector<Foo> foovec;

    // Reserve contigious block of 4 ?-byte chunks of memory
    foovec.reserve(4);  // [ | | | ]

    // How does memory allocation work since object sizes are unkown?
    foovec.emplace_back(std::vector<int> {1,2});        // [{1,2}| | | ]
    foovec.emplace_back(std::vector<int> {1,2,3,4,5});  // [{1,2}|{1,2,3,4,5}| | ]

    return 0;
}

Since we don't know the size of each instance of Foo, how does foovec.reserve() allocate memory? Furthermore, how could you achieve random access time we don't know how far to "jump" to get to the kth element?

Ben
  • 20,038
  • 30
  • 112
  • 189
  • 8
    sorry, but your question is based on a false premise. The size of your object **is** known. Consider that if an object contains a pointer then it is only the pointer not what it points to that contributes to size – 463035818_is_not_an_ai Apr 02 '19 at 15:34
  • 2
    try `sizeof(Foo)`, you will get the size of `Foo` objects and that is the same for any instance (no matter what is the size of the contained vector) – 463035818_is_not_an_ai Apr 02 '19 at 15:35
  • It is doing conceptually `::operator new(4 * sizeof(Foo));`. `sizeof()` is compile-time operator – Severin Pappadeux Apr 02 '19 at 15:37
  • don't be afraid, your mental model does not have to break ^^ – bruno Apr 02 '19 at 15:41
  • "... the compiler basically finds and reserves a contiguous block of memory ..." No, the _runtime_ finds available memory for you. Vectors have dynamic storage. The compiler only allocates storage for static objects, optimizations aside. – alter_igel Apr 02 '19 at 15:41
  • Thank you all! I did not realize that a vector is basically just a pointer to data, and does not contain the data itself. – Ben Apr 02 '19 at 16:07

4 Answers4

11

Your concept of size is flawed. A std::vector<type> has a compile time known size of space it is going to take up. It also has a run time size that it may use (this is allocated at run time and the vector holds a pointer to it). You can picture it laid out like

+--------+
|        |
| Vector |
|        |
|        |
+--------+
     |
     |
     v
+-------------------------------------------------+
|         |         |         |         |         |
| Element | Element | Element | Element | Element |
|         |         |         |         |         |
+-------------------------------------------------+

So when you have a vector of things that have a vector in them, each Element becomes the vector and then those point of to their own storage somewhere else like

+--------+
|        |
| Vector |
|        |
|        |
+----+---+
     |
     |
     v
+----+----+---------+---------+
| Object  | Object  | Object  |
|  with   |  with   |  with   |
| Vector  | Vector  | Vector  |
+----+----+----+----+----+----+
     |         |         |    +---------+---------+---------+---------+---------+
     |         |         |    |         |         |         |         |         |
     |         |         +--->+ Element | Element | Element | Element | Element |
     |         |              |         |         |         |         |         |
     |         |              +-------------------------------------------------+
     |         |    +-------------------------------------------------+
     |         |    |         |         |         |         |         |
     |         +--->+ Element | Element | Element | Element | Element |
     |              |         |         |         |         |         |
     |              +-------------------------------------------------+
     |    +-------------------------------------------------+
     |    |         |         |         |         |         |
     +--->+ Element | Element | Element | Element | Element |
          |         |         |         |         |         |
          +---------+---------+---------+---------+---------+
          

This way all of the vectors are next to each other, but the elements the vectors have can be anywhere else in memory. It is for this reason you don't want to use a std:vector<std::vector<int>> for a matrix. All of the sub vectors get memory to wherever so there is no locality between the rows.


Do note that this applies to all of the allocator aware containers as they do not store the elements inside the container directly. This is not true for std::array as, like a raw array, the elements are part of the container. If you have an std::array<int, 20> then it is at least sizeof(int) * 20 bytes in size.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • As helpful as some of the other more detailed answers are, your diagram makes the flaw in my mental model immediately obvious. I did not realize that a vector is basically just a pointer to a heap allocated buffer as described in [this answer](https://stackoverflow.com/a/52330123/2146894). Thank you for this. – Ben Apr 02 '19 at 16:05
  • 1
    @Ben You're welcome. Sometimes a good picture can be really helpful. Glad you liked it. – NathanOliver Apr 02 '19 at 16:11
  • @Caleth Added a note to the end. – NathanOliver Jul 10 '19 at 13:36
8

the size of

class Foo {

public:
    Foo() = default;
    Foo(std::vector<int> vec): _vec{vec} {}

private:
    std::vector<int> _vec;
};

is known and constant, the internal std::vector does the allocation in the heap, so there is no problem to do foovec.reserve(4);

else how a std::vector can be in the stack ? ;-)

bruno
  • 32,421
  • 7
  • 25
  • 37
3

The size of your class Foo is known at compile time, the std::vector class has a constant size, as the elements that it hold are allocated on the heap.

std::vector<int> empty{};
std::vector<int> full{};
full.resize(1000000);
assert(sizeof(empty) == sizeof(full));

Both instances of std::vector<int>, empty and full will always have the same size despite holding a different number of elements.

If you want an array which you can not resize, and it's size must be known at compile time, use std::array.

Kaldrr
  • 2,780
  • 8
  • 11
2

When you create a vector of type T and then reserve N elements for the vector, the compiler basically finds and reserves a contiguous block of memory

The compiler does no such thing. It generates code to request storage from the vector's allocator at runtime. By default this is std::allocator, which delegates to operator new, which will fetch uninitialized storage from the runtime system.

My mental model breaks down for custom objects of unknown/varying size

The only way a user-defined type can actually have unknown size is if it is incomplete - and you can't declare a vector to an incomplete type.

At any point in your code where the type is complete, its size is also fixed, and you can declare a vector storing that type as usual.


Your Foo is complete, and its size is fixed at compile time. You can check this with sizeof(Foo), and sizeof(foovec[0]) etc.

The vector owns a variable amount of storage, but doesn't contain it in the object. It just stores a pointer and the reserved & used sizes (or something equivalent). For example, an instance of:

class toyvec {
  int *begin_;
  int *end_;
  size_t capacity_;
public:
  // push_back, begin, end, and all other methods
};

always has fixed size sizeof(toyvec) = 2 * sizeof(int*) + sizeof(size_t) + maybe_some_padding. Allocating a huge block of memory, and setting begin to the start of it, has no effect on the size of the pointer itself.


tl;dr C++ does not have dynamically-resizing objects. The size of an object is fixed permanently by the class definition. C++ does have objects which own - and may resize - dynamic storage, but that isn't part of the object itself.

Useless
  • 64,155
  • 6
  • 88
  • 132