Why are C++ STL vectors 1000x slower when doing many reserves?

Question

I've run into a strange situation.

In my program I have a loop that combines a bunch of data together in a giant vector. I was trying to figure out why it was running so slowly, even though it seemed like I was doing everything right to allocate memory in an efficient manner on the go.

In my program it is difficult to determine how big the final vector of combined data should be, but the size of each piece of data is known as it is processed. So instead of reserving and resizing the combined data vector in one go, I was reserving enough space for each data chunk as it is added to the larger vector. That's when I ran into this issue that is repeatable using the simple snippet below:

std::vector<float> arr1;
std::vector<float> arr2;
std::vector<float> arr3;
std::vector<float> arr4;
int numLoops = 10000;
int numSubloops = 50;

{
    // Test 1
    // Naive test where no pre-allocation occurs

    for (int q = 0; q < numLoops; q++)
    {
        for (int g = 0; g < numSubloops; g++)
        {
            arr1.push_back(q * g);
        }
    }
}

{
    // Test 2
    // Ideal situation where total amount of data is reserved beforehand

    arr2.reserve(numLoops * numSubloops);
    for (int q = 0; q < numLoops; q++)
    {
        for (int g = 0; g < numSubloops; g++)
        {
            arr2.push_back(q * g);
        }
    }
}

{
    // Test 3
    // Total data is not known beforehand, so allocations made for each
    // data chunk as they are processed using 'resize' method

    int arrInx = 0;
    for (int q = 0; q < numLoops; q++)
    {
        arr3.resize(arr3.size() + numSubloops);
        for (int g = 0; g < numSubloops; g++)
        {
            arr3[arrInx++] = q * g;
        }
    }
}

{
    // Test 4
    // Total data is not known beforehand, so allocations are made for each
    // data chunk as they are processed using the 'reserve' method

    for (int q = 0; q < numLoops; q++)
    {
        arr4.reserve(arr4.size() + numSubloops);
        for (int g = 0; g < numSubloops; g++)
        {
            arr4.push_back(q * g);
        }
    }
}

The results of this test, after compilation in Visual Studio 2017, are as follows:

Test 1: 7 ms
Test 2: 3 ms
Test 3: 4 ms
Test 4: 4000 ms

Why is there the huge discrepancy in running times?

Why does calling reserve a bunch of times, followed by push_back take 1000x times longer than calling resize a bunch of times, followed by direct index access?

How does it make any sense that it could take 500x longer than the naive approach which includes no pre-allocations at all?

The vectors default expansion strategy is probably exponential, while your change forces it into being linear. — StoryTeller - Unslander Monica, Jan 31 '18 at 06:29
Also `push_back` does more work than `operator[]`. That could explain the `resize()` vs `reserve()` observation. — juanchopanza, Jan 31 '18 at 06:30
@StoryTeller But in Test 3, it is linearized as well. So what is the difference? — Daniel Langr, Jan 31 '18 at 07:58
@juanchopanza I don't think so. There is no reallocation in `push_back` in Test 4. And, Test 1 shows that `push_back` is pretty fast as well. — Daniel Langr, Jan 31 '18 at 07:59
@DanielLangr - It's not. It's using `resize`, which adds fully constructed items to the vector, changing `size()` in a predictable manner. But the behavior or `resize` with regard to the `capacity` is not specified. It can allocate raw storage intelligently. — StoryTeller - Unslander Monica, Jan 31 '18 at 08:00
@StoryTeller You're right, I just checked with GCC (libstdc++). In Test 3, the capacity is doubled each time. In Test 4, it's not. I didn't know that `resize` behaves this way within implementations. Now, I just wonder why `reserve` does not do the same, since it can. It's kind of inconsistent, I would expect same expansion strategy here. — Daniel Langr, Jan 31 '18 at 08:12
@DanielLangr - It can't behave the same. The result of calling `capacity` after `reserve` is specified by the C++ standard. Its hands are tied up. — StoryTeller - Unslander Monica, Jan 31 '18 at 08:13
@StoryTeller From C++11 Standard §23.3.6.3(2): _"After `reserve()`, `capacity()` is **greater or equal** to the argument of `reserve`..."_ So, the implementation can double the capacity as well as `resize`, can't it? — Daniel Langr, Jan 31 '18 at 08:18
@DanielLangr - I think that's a good subject for posting another question :) — StoryTeller - Unslander Monica, Jan 31 '18 at 08:20
@DanielLangr Do you think maybe `push_back` has to check the capacity to see if the vector needs to reserve more? And do you think `operator[]` has to do this? — juanchopanza, Jan 31 '18 at 08:39
@juanchopanza Sure, I know this. But my point was that Test 1 proved that this couldn't explain such a huge difference in runtime of Test 4. — Daniel Langr, Jan 31 '18 at 08:43

Benjamin Lindley · Accepted Answer · 2018-01-31T06:33:49.993

20

How does it make any sense that it could take 500x longer than the naive approach which includes no pre-allocations at all?

That's where you're mistaken. The 'naive' approach you speak of does do pre-allocations. They're just done behind the scenes, and infrequently, in the call to push_back. It doesn't just allocate room for one more element every time you call push_back. It allocates some amount that is a factor (usually between 1.5x and 2x) of the current capacity. And then it doesn't need to allocate again until that capacity runs out. This is much more efficient than your loop which does an allocation every time 50 elements are added, with no regard for the current capacity.

edited Jan 31 '18 at 06:33

answered Jan 31 '18 at 06:30

Benjamin Lindley

101,917
9
204
274

Thanks for the quick answer! I forgot that vectors will internally re-allocate themselves in an efficient manner once they reach capacity....and since they never reach capacity in test4, they're left with my inefficient manual allocations. Makes perfect sense! – Tyson Jan 31 '18 at 06:33
2

To be fair, this does not answer why `reserve` isn't simply ignored in this case. You'd expect an efficient implementation to simply ignore allocations that are smaller than what it already has. – Shachar Shemesh Jan 31 '18 at 06:38
This also tells us something interesting about `resize`. I would expect similar allocation behavior between `reserve` and `resize`, that you get exactly what you ask for, but this does not seem to be the case. – user4581301 Jan 31 '18 at 06:39
3

@ShacharShemesh: I'm not sure what you're saying. Values smaller than the current capacity passed to `reserve` *are* ignored. And it's not up to the implementation, it is a standard requirement. The values the OP is passing do not fit that description though. – Benjamin Lindley Jan 31 '18 at 06:46
@user4581301 - You shouldn't expect similar behavior. `resize` adds *items*, while `reserve` adds just an exact amount of space. – StoryTeller - Unslander Monica Jan 31 '18 at 06:49
@StoryTeller My thinking goes like this, If I resize the vector to 50 elements, I'd expect storage for 50 elements. If I later resize to 100 elements, I'd expect storage for 100 elements. 150, 200, and so on. I would be saving on some time with the `resize` and subscript operator over the `reserve` and `push_back` approach, but I would have expected the same linear re-allocation of storage. This is suggesting that if I repeatedly `resize`, I may be allocating more storage than I expected. – user4581301 Jan 31 '18 at 07:01
@user4581301 - And you'll get 50 elements, and later 100 more elements. And the vector will reallocate to some capacity that can hold them. That's the freedom here. The end capacity after `resize` is not specified, but it is for `reserve`, since that one operates exclusively on the vectors capacity. – StoryTeller - Unslander Monica Jan 31 '18 at 07:03
@user4581301: And this also means that you don't have to worry about what method you use to add elements to your vector, be it `push_back`, `resize` or `insert`. You will get the same complexity guarantees. The only time you think about memory allocation for your vector is when you use the function that is explicitly designed to deal with it. – Benjamin Lindley Jan 31 '18 at 07:11
@StoryTeller And that's my takeaway from the asker's experiment. I learned something new about `resize`. As soon as you see the numbers, you can make some good guesses about what happened. No damn way `push_back` without a forced re-allocation is thousands of times less efficient than the subscript operator, so that leaves `reserve` doing more re-allocation and copying than `resize`. I've never run into a case where I'd had to profile memory use after after multiple `resize`s, so I hadn't noticed. Worth knowing, though. – user4581301 Jan 31 '18 at 07:15
This is an example of [Shlemiel the painter's algorithm](http://www.joelonsoftware.com/articles/fog0000000319.html). – Peter Mortensen Jan 31 '18 at 12:44

score 4 · Answer 2 · edited Jun 20 '20 at 09:12

@Benjamin Lindley's answer explains the capacity of std::vector. However, for exactly why the 4th test case is that slow, in fact it's an implementation detail of the standard library.

[vector.capacity]

void reserve(size_type n);

...

Effects: A directive that informs a vector of a planned change in size, so that it can manage the storage allocation accordingly. After reserve(), capacity() is greater or equal to the argument of reserve if reallocation happens; and equal to the previous value of capacity() otherwise. Reallocation happens at this point if and only if the current capacity is less than the argument of reserve().

Thus it is not guaranteed by C++ standard that after reserve() for a larger capacity, the actual capacity should be the requested one. Personally I think it's not unreasonable for an implementation to follow some specific policy when such larger capacity request is received. However, I also tested on my machine, it seems the STL just does the simplest thing.

Why are C++ STL vectors 1000x slower when doing many reserves?

2 Answers2

Linked