Why do dynamic arrays have to geometrically increase their capacity to gain O(1) amortized push_back time complexity?

Question

I've learned that dynamic arrays, such as std::vector, double their capacity when their capacity is reached to make the push_back operation O(1) amortized time.

However, why is this needed in the first place? Isn't allocating space for one element at the end of the vector and copying the new element there already O(1)?

A note: Doubling the underlying capacity is not commonly used; it's too aggressive, and too likely to end up with excessive memory waste. It's almost always a calculation that, over the long haul, behaves as a multiplier of some kind, but not as high as 2x. 1.3x or 1.5x or the like gets the same amortized `O(1)` without risking as much waste of memory. — ShadowRanger, Nov 08 '20 at 04:19
Note that computers are lazy, and you'll catch them doing all sorts of things like going, "Yeah, I CAN just extend that block of memory!' and a good `vector` implementation will take advantage of this when it happens. — user4581301, Nov 08 '20 at 04:20

Dietrich Epp · Accepted Answer · 2020-11-08T04:21:51.753

If you want to allocate space at the end of the array, that only works if the memory at that location is available. Something else could already be there, or that memory could be unusable. So the way that resizing an array works (in general):

Create a new, bigger array,
Copy the elements from the original array into the bigger array, and
Destroy the original array.

As you can see, when you increase the size of the array, you pay a cost proportional to the original size of the array.

So if you start with an array with one element and add a second element, you have to copy the first element into another array. If you add a third element, you have to copy the other two elements. If you add a fourth element, you have to copy the first three. This adds up to 1+2+3...+N, which is equal to N(N+1)/2, which is in O(N²). See Arithmetic Progression (Wikipedia)

If you resize the array with a geometric progression, you still have to copy the elements each time, but you are copying them fewer times.

If you resize by doubling the array, then when you get some power of two size N, N/2 will have been copied 0 times, N/4 will have been copied once, N/8 will have been copied twice, and so on. The sum of 0N/2 + 1N/4 + 2N/8 + 3N/16... is in O(N). See Geometric Series (Wikipedia)

You do not need to pick doubling, you can pick some other factor, like 1.5x. Choosing a different factor does not change the asymptotic complexity but it does change the real performance.

Do you want to divide your calculations by the number of additions (N) to get the average cost per addition? (The question's O(1) is the average per addition.) — JaMiT, Nov 08 '20 at 04:55
I thought about writing that into the answer, but I think that the total cost of resizing the array is an easier / simpler thing to reason about than amortized cost. So I have avoided talking about amortized cost, to hopefully make the discussion of the differences between resizing strategies clearer. — Dietrich Epp, Nov 08 '20 at 05:28
Trying to strike a balance between writing an answer with more detail and writing an answer with more clarity, because detail and clarity are often at odds. — Dietrich Epp, Nov 08 '20 at 05:29
That's reasonable. I'll leave the comment, though, to document that this was considered. — JaMiT, Nov 09 '20 at 00:15

score 2 · Answer 2 · answered Nov 08 '20 at 04:12

If you could just allocate space for one more element at the end, and copy the new element there, it would indeed be O(1).

But the standard library doesn't provide a way to do that, mostly because you can't depend on actually being able to do it.

So what normally ends up happening is that you allocate a whole new block of memory, copy (or move) the existing data from the existing block to the new block, then add the new element to the new block. And that extra step of copying/moving all the elements from the existing block to the new block is linear, meaning that adding a new element is linear.

score 1 · Answer 3 · answered Nov 08 '20 at 04:12

The difficulty here is that a std::vector keeps its contents in contiguous memory. It's not enough to allocate memory for a single additional element and move it there. You need to guarantee that you have enough memory for the entire container, all in one place. Each allocation might require copying over all previous contents to maintain contiguity, so it's not necessarily actually O(1) to get enough memory for a single additional element.

Why do dynamic arrays have to geometrically increase their capacity to gain O(1) amortized push_back time complexity?

3 Answers3

Linked