16

My question is regarding the effect of vector::push_back, I know it adds an element in the end of the vector but what happens underneath the hood?

IIRC memory objects are allocated in a sequential manner, so my question is whether vector::push_back simply allocates more memory immediately after the vector, and if so what happens if there is not enough free memory in that location? Or perhaps a pointer is added in the "end" to cause the vector to "hop" to the location it continues? Or is it simply reallocated through copying it to another location that has enough space and the old copy gets discarded? Or maybe something else?

Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
dtech
  • 47,916
  • 17
  • 112
  • 190

7 Answers7

23

If there is enough space already allocated, the object is copy constructed from the argument in place. When there is not enough memory, the vector will grow it's internal databuffer following some kind of geometric progression (each time the new size will be k*old_size with k > 1[1]) and all objects present in the original buffer will then be moved to the new buffer. After the operation completes the old buffer will be released to the system.

In the previous sentence move is not used in the technical move-constructor/ move-assignment sense, they could be moved or copied or any equivalent operation.

[1] Growing by a factor k > 1 ensures that the amortized cost of push_back is constant. The actual constant varies from one implementation to another (Dinkumware uses 1.5, gcc uses 2). The amortized cost means that even if every so often one push_back will be highly expensive (O(N) on the size of the vector at the time), those cases happen rarely enough that the cost of all operations over the whole set of insertions is linear in the number of insertions, and thus each insertion averages a constant cost)

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • Well, in that case, wouldn't it be more efficient to sort of "fragment" the vector by using pointers to its fragments so it can still be iterated, this is the case of huge vectors, whose reallocation would induce a noticeable bottleneck? Is there an option like this or I will have to implement my own version of a vector? Could the loss of a few ints worth of pointers and a few hops outweigh the penalty of reallocation? – dtech Oct 26 '11 at 07:54
  • 1
    @ddriver: That's a rope. after some back and forth, the STL implementors have agreed that std::vector guarantees contiguous memory . A rope also doesn't have a O(1) access by index, depending on number of segments. – peterchen Oct 26 '11 at 07:56
  • That is exactly why vectors are used: They have contiguous memory. For other things, you could use a linked list or something. – arne Oct 26 '11 at 08:02
  • 5
    @ddriver: This is [Deque](http://www.sgi.com/tech/stl/Deque.html) is usually implemented (with the pointer between chunks of data). – Martin York Oct 26 '11 at 08:02
  • @David: follow up question. I'm curious what happens if I take a reference say `int & x = myintvector[42]` and then proceed to add a lot of elements to the vector, enough to trigger a move. Am I allowed to have taken that reference in the first place? – Vlad Oct 26 '11 at 08:04
  • 4
    @Vlad: That would invalidate the reference. You are allowed to take the reference, but you have to take care of the invalidation-rules of the container. – Björn Pollex Oct 26 '11 at 08:13
  • Thanks Loki - but from what I read Deque is not size aware, so I will still need to implement extra fail safe functionality. Thanks! – dtech Oct 26 '11 at 08:25
  • 2
    The object isn't constructed in place. It's copied in place and the copy operation can be elided by the compiler. It's VERY different. vector::emplace of C++11 constructs in place. – xanatos Oct 26 '11 at 08:30
  • @xanatos: I did not mean that the argument to `push_back` is constructed in place, but rather that the object in the container is constructed in place. As to whether that copy can be elided by the compiler, I doubt it, the parameter is constructed by the caller at the place of call, and at that time it cannot possibly know the location where the object will be, the caller must allocate and construct the argument, then pass a reference to it to `push_back`, all that before knowing the location of the buffer. (I am assuming no actual inlining, with real inlining anything can happen) – David Rodríguez - dribeas Oct 26 '11 at 08:37
  • @DavidRodríguez-dribeas +1 Still now it's much clearer and there is no ambiguity :-) I was thinking (for example) `myVector.push_back(MyObject())`. Someone that doesn't know how much C++ "loves" the copy constructor could have thought that the object was constructed directly in place. – xanatos Oct 26 '11 at 08:43
4

When vector is out of space, it will use it's allocator to reserve more space.

It is up to the allocator to decide how this is implemented.

However, the vector decides how much space to reserve: the standard guarantees that the vector capacity shall grow by at least a factor of 1.51 geometrically (see comment), thus preventing horrible performance due to repeated 'small' allocations.

On the physical move/copy of elements:

  • c++11 conforming implementations will move elements if they support move assignment and construction
  • most implementations I know of (g++ notably) will just use std::copy for POD types; the algorithm specialisation for POD types ensures that this compiles into (essentially) a memcpy operation. This in turn gets compiled in whatever CPU instruction is fastest on your system (e.g. SSE2 instructions)

1 I tried finding the reference quote for that from the n3242 standard draft document, but I was unable to find it at this time

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 3
    The standard only guarantees (indirectly, via complexity requirements) that the growth be geometric, rather than arithmetic. It doesn't set any minimum requirement for the multiplier; an implementation which grew by a factor of 1.1 would be conformant. – James Kanze Oct 26 '11 at 08:08
  • @JamesKanze: Ah, that explains me being unable to find it. I did sort of wonder when I found the complexity requirements for push_back (reminds me of http://stackoverflow.com/questions/7554039/is-stringc-str-no-longer-null-terminated-in-c11/7554172#7554172) – sehe Oct 26 '11 at 08:26
2

When vector runs out of space, it is reallocated and all the elements are copied over to the new array. The old array is then destroyed.

To avoid an excessive number of allocations and to keep the average push_back() time at O(1), a reallocation requires that the size be increased by at least a constant factor. (1.5 and 2 are common)

Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • 1
    @James Kanze: Thanks for the response. I'll edit accordingly. – Mysticial Oct 26 '11 at 08:14
  • Mmm. I'm looking through the same specs. Regardless, 'it is reallocated' at least glosses over the notion of allocators. The place where I don't feel things can be right is where it says 'The old array is then destroyed' _unconditionally_. That would seem like extremely inefficient thing to do if the allocator, e.g. uses something similar to realloc() to achieve it's goals. Reverting my -1 till someone sheds more light, though – sehe Oct 26 '11 at 08:15
  • @sehe: Good point on the `realloc()`. I'm not sure on the status of this either. – Mysticial Oct 26 '11 at 08:19
  • There is no `realloc`-like operation in the `Allocator` interface used by standard containers. `vector` must allocate a new array, copy the contents, then free the old one. Agreed, this is inefficient, since the Allocator might have been able to extend the existing allocation in-place, but it doesn't get the chance. I'd have to look closely at the standard to judge where an implementation is permitted, in the case of a vector using the default Allocator, to introduce a realloc-style optimization there. – Steve Jessop Oct 26 '11 at 08:43
  • @SteveJessop There's always the "as if" rule. The standard does require that the default allocator use `operator new` to obtain the memory, which means that a user defined `operator new` can see that, but is explicitly says that "it is unspecified when or how often this function [`operator new`] is called." That sounds like sufficient leeway to allow such an optimization. – James Kanze Oct 26 '11 at 09:15
  • @James: yes, I think it's OK from that POV, but what I'd be closely looking for is whether the standard implies that when a vector's capacity changes, it guarantees to copy (or move in C++11) the elements to a new address. If so, then you can observe that in the copy/move constructor. – Steve Jessop Oct 26 '11 at 09:27
  • @SteveJessop I can't find anything. There are a couple of places where "reallocation" is required, but about the only thing the standard says about "reallocation" is that it invalidates iterators, pointers and references into the `vector`. Nothing about how many times (if any) the copy constructor is called. – James Kanze Oct 26 '11 at 09:53
  • @James: Good, so `vector` could be specialized by the implementation. The specialization would apply to vectors that use `std::allocator`. Since users can specialize `std::allocator` for user-defined types, it would of course avoid vectors that use specializations of `std::allocator`, so possibly my concern about copy constructors could be dealt with on that account too. For example, if the implementations just specializes `std::vector >` for built-in types `T` then you're safe. – Steve Jessop Oct 26 '11 at 10:00
  • After all of which, I think it's *reasonably* safe to say without qualification that "the old array is destroyed". It gives the right idea of how you should expect `vector` to behave in general, and if there are particular implementations that manage an optimization in which the new array in fact is in the same location as the old array, that's nice but not to be relied on. – Steve Jessop Oct 26 '11 at 10:03
2

A vector gurantees that all elements are contigious in memory.

Internally you can think of it as defined as three pointers (or what act like pointers):

start:     Points at the beginning of the allocated block.
final:     Points one past the last element in the vector.
           If the vector is empty then start == final 
capacity:  Points one past the end of allocated memory.
           If final == capacity there is no room left.

When you push back.

  1. If final is smaller than capacity:
    • the new element is copied into the location pointed at by final
    • final is incremented to the next location.
  2. If final is the same as capacity then the vector is full
    • new memory must be allocated.
    • The compiler will then allocate X*(capacity - start)*sizeof(t) bytes.
    • where X is usually a value between 1.5 and 2.
    • It then copies all the values from the old memory buffer to the new memory buffer.
    • the new value is added to the buffer.
    • Transfers start/final/capacity pointers.
    • Free's up the old buffer
Martin York
  • 257,169
  • 86
  • 333
  • 562
0

When you call vector::push_back the end pointer is compared to the capacity pointer. If there is enough room for the new object placement new is called to construct the object in the available space and the end pointer is incremented.

If there isn't enough room the vector calls its allocator to allocate enough contiguous space for at least the existing elements plus new element (different implementation may grow the allocated memory by different multipliers). Then all existing elements plus the new one are copied to the newly allocated space.

Praetorian
  • 106,671
  • 19
  • 240
  • 328
0

std::vector overallocates - it will usually allocate more memory than necessary automatically. sizeis not affectd by this, but you can control that through capacity.

std::vector will copy everything if the additional capacity is not sufficient.

The memory allocated by std::vector is raw, no constructors are called on demand, using placement new.

So, push_back does:

  • if capacity is not sufficient for the new element, it will
    • allocate a new block
    • copy all existing elements (usually using the copy constructor)
  • increase size by one
  • copy the new element to the new location
peterchen
  • 40,917
  • 20
  • 104
  • 186
0

If you have some idea of what will be the final size of your array, try to vector::reserve the memory first. Note that reserve is different from vector::resize. With reserve the vector::size() of your array is not changed

linello
  • 8,451
  • 18
  • 63
  • 109