45

I have been using std::vector a lot, and recently I asked myself this question: "How is std::vector implemented?"

I had two alternatives:

1) Linked list, and then making the API feel like random access (i.e. overloading operator[]).

2) Using new, e.g. Foo* temp = new Foo[20]: I believe they do something like this, but then it raises one more question. Do they always allocate a maximum (uint32_t) storage to give random access? (This is inefficient in terms of memory.)

Or is there something else that I should be aware of?

Padma Kumar
  • 19,893
  • 17
  • 73
  • 130
Arun
  • 489
  • 1
  • 4
  • 4

9 Answers9

50

It's implemented by using an underlying array.

It's not possible to implement a std::vector<T> with a linked list because the standard guarantees the elements in the list will be held in contiguous memory.

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • 1
    Also, lookup time is important. – Ed S. Jan 19 '10 at 20:53
  • 2
    Contiguous memory is only required by the upcoming C++ standard. But all available implementation today use an array anyway. This is an indirect requirement by the O(1) access time. – Axel Gneiting Jan 19 '10 at 21:16
  • 26
    From the current (2003) standard: "The elements of a vector are stored contiguously" (23.2.4/1). – James McNellis Jan 19 '10 at 22:19
  • 5
    @Axel: No, it is already required by C++03. It was not required by C++98, but it was always intended to be contiguous, so C++03 made it explicit. – jalf Jan 19 '10 at 23:24
  • 3
    Okay seems that I mixed that one up. I'm sorry. – Axel Gneiting Jan 20 '10 at 00:29
  • @AxelGneiting this is not necessarily indirect requirement of O(1). Hash table could also accomplish this in O(1), without contiguous data... – mip Apr 26 '12 at 12:55
  • @doc Hash table only has O(1) average case, worst case is O(n), which makes it unsuitable for `std::vector` – Erbureth Jun 30 '14 at 10:04
  • @Erbureth hash table is unsuitable for `vector` implementation for obvious reasons. It would be just retarded to implement `vector` over hash table. I provided this example to show that O(1) requirement does not imply array. Worst case complexity is not the case, since it depends on collision resolution algorithm. You could still have O(1) access time, if you resolve collisions by reindexing whole array, so that none of the elements share same key. Of course such insertions would be very expensive, but access time remains constant. – mip Jun 30 '14 at 18:20
26

I believe it is the third option. It can't just use new T[n] because then it would actually have to construct as many objects as it allocates. E.g

std::vector<Foo> v;
v.reserve(10);

If your implementation simply ended up doing new Foo[10] then you'd just have constructed 10 instances of Foo.

Instead it uses its allocator to allocate and deallocate raw memory (without constructing objects), and as needed (for example, when you actually push_back objects) places copy-constructed instances into correct memory locations in its reserve using placement new and removes them with explicit calls to the destructor (something you'd only do in combination with placement new). The allocator class provides following methods for that which I presume vector's implementations use

 void construct(pointer p, const_reference val);

  Returns:
    new((void *)p) T(val)

  void destroy(pointer p);

  Returns:
    ((T*)p)->~T()

(The "returns" probably should read "effect" or similar.)

More about placement new

UncleBens
  • 40,819
  • 6
  • 57
  • 90
  • 2
    Yes, it's actually the only one which answers the question of how a vector is implemented, the rest kinda keep repeating trivia of contiguous memory... – pfalcon Jan 16 '13 at 22:24
  • The object also may constructed by 'move' if possible, e.g. push_back(Foo()); How does keep up to the stored objects and remembers to call their destructors?! they may not be contiguous in underlying memory . – Mehran Aug 20 '22 at 13:56
18

They use a dynamically allocated array that is regrown as needed. It is necessary to use something like an array so that the elements are contiguous in memory which is guaranteed by the standard.

Incidentally, one common way of regrowing an array is to double the size as needed. This is so that if you are inserting n items at most only O(log n) regrowths are performed and at most O(n) space is wasted.

You can read one implementation for yourself at SGI (where STL was originally conceived).

jason
  • 236,483
  • 35
  • 423
  • 525
2

There is no one way it is implemented. Different implementations can be different, so long as the preserve the semantics and satisfy the requirements.

At any given time, there has to be a primitive array of T to satisfy the requirements of contiguity. However, how it is allocated, grown, shrunk, and freed is up to the implementor.

You can read the implementation for yourself, it's right there in the header file.

I can tell you that no implementations use linked lists. They aren't consistent with the requirements of the standard.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
  • 7
    I will assert unequivocally there is no implementation that is a linked list. – jason Jan 19 '10 at 19:58
  • 1
    Part of the required semantics are O(1) random access. Linked list isn't a possibility. – jamesdlin Jan 19 '10 at 20:05
  • 1
    -1: the standard requires "The elements of a vector are stored contiguously, meaning that if v is a vector where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size()." If there is another way to implement it, it is indistinguishable from an array. – Potatoswatter Jan 19 '10 at 21:18
  • There is a lot of room for difference in how the array gets allocated, freed, and all that other stuff the OP asked about. – bmargulies Jan 19 '10 at 21:18
  • @bmargulies: There is a requirement for amortized constant time on push_back() operations, which requires exponential growth of the underlying array. The only wiggle room is what that growth factor is. I think 2 is most common, but 1.5 has been shown to be optimal IIRC. – Drew Hall Jan 19 '10 at 21:26
  • It gets allocated and freed by the vector's allocator, where the default allocator simply calls `operator new()`, `operator delete()`, and the class' ctor/dtor. Name one potential difference between two implementations besides the resize-step formula. – Potatoswatter Jan 19 '10 at 21:37
  • @Potatoswatter: It's a matter of emphasis. I chose to emphasize the fact that there is abstraction going on here, however minimal, and that the only way to have certain knowledge is to read the implementation. Obviously, my choice was not especially popular. That's why we got votes. – bmargulies Jan 19 '10 at 23:00
2

Section 23.2.4, ¶1 of the standard requires that arithmetic on pointers into a vector work the same as with pointers into an array.

The elements of a vector are stored contiguously, meaning that if v is a vector where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

This guarantees that the storage is in an array. Of course, if you resize the array to be bigger, it might get moved in memory.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
2

A pedagogic (and thus simplified) version of a container called "Vec" is discussed in Chapter 11 of the wonderful (introductory) book "Accelerated C++". What they describe is a stripped-down version of std::vector, but I think it is still worth noting that:

1) they implement their template class in terms of an array,

2) they discuss push_back in terms of the trick (mentioned above) of allocating more storage than is needed, and coming back for more when they run out, and

3) they use allocator<T> for memory management. The new operator is not flexible enough in this context, since it both allocates and initializes memory.

I repeat, though, that this doesn't mean that actual implementations out there are this simple. But since "Accelerated C++" is quite widespread, those interested can find in the relevant chapter one way vector-like objects can be created, copied, assigned, and destroyed.

EDIT: On a related note, I just found the following blog post by Herb Sutter in which he comments on an earlier blog post by Andrew Koenig, regarding whether or not one should be worried about vector elements being contiguous in memory: Cringe not: Vectors are guaranteed to be contiguous.

Alexandros Gezerlis
  • 2,221
  • 16
  • 21
1

I believe the STL uses option #2 (or something similar) because a std::vector<> is guaranteed to store the elements in contiguous memory.

If you're looking for a memory structure that doesn't need to use contiguous memory, look at std::deque.

oz10
  • 153,307
  • 27
  • 93
  • 128
1

There's no actual array at all in any decent implementation (if there is, you can't use any object in it without a default constructor), but just raw memory that gets allocated. It gets allocated in a manner that's usually along the lines of doubling every time you need to expand it.

The vector then uses in place allocation to call the constructors of the class in the proper location once each slot actually gets used actually used.

When there is expansion it will try to reallocate in place (but this is a bit silly and doesn't normally work, think windows 98 heap compaction) but usually will end up making a whole new allocation and copying over.

A standard stl vector is always all together, but not all implementations work like that (I know, having written some of them). Probably none are exactly a linked list, though, either.

0

From what I have read in books, and from the functionality of reserve() and the requirement that elements of vectors be contiguous, this is what I think could be a possible way to implement vector.

  1. Elements of vectors be contiguous, supporting O(1) random access and vectors should be compatible with C arrays. This just implies there are no linked lists.

  2. When you call reserve() it reserves additional memory. But reserve() does not call new T[newSize] to reserve more memory; if it did, it would call default constructors for the new elements. As uncleben explained, whenever reserve() is called, the vector class just allocates more uninitialized memory using its allocator (if required) and copy-constructs new objects into that memory using placement new (if more memory has been allocated).

  3. Initially vector has some default capacity, for which uninitialized memory is allocated when the vector object is constructed.

  4. push_back() copy constructs the object into the first available location. If required more memory has to be allocated, in similar manner as reserve().

underscore_d
  • 6,309
  • 3
  • 38
  • 64
Yogesh Arora
  • 2,216
  • 2
  • 25
  • 35