Being Smart About Vector Memory Allocation

Question

Let's say I have to iterate over a potentially very large vector of numbers and copy the even and odd elements into new, separate vectors. (The source vector may have any proportion of evens to odds; it could be all evens, all odds, or somewhere in-between.)

For simplicity, push_back is often used for this sort of thing:

for (std::size_t Index; Index < Source.size(); Index++)
{
    if (Source[Index] % 2) Odds.push_back(Source[Index]);
    else Evens.push_back(Source[Index]);
}

However, I'm worried that this will be inefficient and be harmful if it's used as part of the implementation for something like a sorting algorithm, where performance is paramount. QuickSort, for example, involves separating elements much like this.

You could use reserve() to allocate memory before-hand so only one allocation is needed, but then you have to iterate over the entire source vector twice - once to count how many elements will need to be sorted out, and once more for the actual copying.

You could, of course, allocate the same amount of space as the source vector's size, since neither new vector will need to hold more than that, but that seems somewhat wasteful.

Is there a better method that I'm missing? Is push_back() usually trusted to manage this sort of thing for the programmer, or can it become burdensome for sensitive algorithms?

Wow, a downvote literally seconds after I posted the question. What's the reason? — Maxpm, Jul 24 '11 at 02:22
`Odds.reserve(Source.size() / 2 + 1); Evens.reserve(Source.size() / 2 + 1);` — Cory Nelson, Jul 24 '11 at 02:28
Why do you think that std::sort will care or be more efficient when the memory is allocated all at once? — Soren, Jul 24 '11 at 02:28
@Maxpm Some StackOverflow readers hate performant code on general principle. — Crashworks, Jul 24 '11 at 02:30
@Cory That's not guaranteed to be enough space; the vector could have all odds, all evens, or somewhere in-between. @Soren I don't follow. If you're talking about how `std::sort()` manages its allocations internally (assuming it's not in-place), of course it will be slower if it has to allocate more times than necessary. — Maxpm, Jul 24 '11 at 02:37
@Maxpm: I don't know of any implementation of `std::sort()` that needs to allocate anything, it *does* sort in place. — Benjamin Lindley, Jul 24 '11 at 02:40
@Maxpm: `std::sort` typically uses a variant of quicksort (usually introsort) which in addition to ensuring O(nlog(n)) runtime, is also in-place. (If you want to get really specific, it uses log(n) stack space...). — Darren Engwirda, Jul 24 '11 at 02:42
@Benjamin Then you can assume it uses some custom-written algorithm, or that it isn't a sorting algorithm at all. The point is that it's an important place to avoid a bottleneck. — Maxpm, Jul 24 '11 at 02:44
@Maxpm: I can assume *what* uses some custom-written algorithm? *What* is an important place to avoid a bottleneck? — Benjamin Lindley, Jul 24 '11 at 02:47
@Maxpm what about counting the number of odd / even elements first and then reserve based upon those metrics? — Lea Hayes, Jul 24 '11 at 02:50
@Maxpm -- I think your fundamental thinking is wrong. What is most efficient depends on how you r memory allocation also works. In particular how the allocator deals efficently with memory fragmentation on repeated allocs/frees. The container (Vector,Deque) does not give you access to contiguous memory even it is was allocated that way. You shoukd assume that the Containers are build to have maximum efficiency on the particular platform. — Soren, Jul 24 '11 at 02:53
@Lea I mentioned that in the question. See the second paragraph after the code sample. — Maxpm, Jul 24 '11 at 02:54
@Soren Well, that's what I'm asking. If you believe that `push_back()` is always so efficient that whatever performance is lost by using it is negligible, feel free to add that as an answer. — Maxpm, Jul 24 '11 at 02:55
@Maxpm: Eh?? The underlying array for `std::vector` is contiguous (this is specified in the standard). As @Ben said, I have never heard of a `std::sort` implementation that is not in-place. Given these two points, it doesn't seem reasonable that the efficiency of `std::sort` would be affected by how much "extra" space was allocated in your vectors. Unless you're super concerned about space efficiency, just over allocate initially and shrink afterwards. — Darren Engwirda, Jul 24 '11 at 02:58
@Soren: The underlying array in `std::vector` **is** contiguous. The standard demands this type of behaviour. `std::deque` is not required (and in practice probably isn't) contiguous. — Darren Engwirda, Jul 24 '11 at 03:02
Another downvote? I really wish people would just explain why. — Maxpm, Jul 24 '11 at 03:06
@Maxpm: The question about `push_back` efficiency is valid, but I can't understand what you're getting at with sorting efficiency. You specifically ask about `quicksort` - any worthwhile implementation **is** in-place, so there's no allocation worry here. If you're worried about a non-in-place sort (`mergesort` maybe) it will only allocate additional space based on the number of elements in the vector - **not** the amount of space allocated to the vector. What am I missing?? — Darren Engwirda, Jul 24 '11 at 03:19
@Darren I didn't specifically ask about QuickSort, I used it as a mere example. The algorithm can be anything, as long as it requires a source vector's elements to be separated into two or more new vectors based on value. — Maxpm, Jul 24 '11 at 03:28
Everyone: I'm sorry for the confusion; I just now realized that the question was a bit unclear. The separating of the source vector's elements is a part of an algorithm that the programmer is writing, and that algorithm needs to be as efficient as possible. I've edited the question to clarify this. — Maxpm, Jul 24 '11 at 03:33
If a linear search of your array, before performing another linear search through your array, is too wasteful, then you are just out of luck. — Dennis Zickefoose, Jul 24 '11 at 03:37
ANOTHER downvote?! What the hell?! None of these have gotten any explanation! — Maxpm, Jul 24 '11 at 17:07
@Maxpm - I've found that asking realtime performance questions on StackOverflow is kind of like asking Congress for money-saving advice. People here are so used to working on server computers with no memory or performance constraints that they're forgotten there are domains where resources are *not* infinite. It's easy to get any car to 60mph if you put a 500hp engine on it, but real engineering is getting it to that speed with a tiny motor! Or maybe your downvoters just like slow code and secretly dream of programming DVRs for Comcast ( http://www.youtube.com/watch?v=iWVBCuzW0EM ). — Crashworks, Jul 25 '11 at 01:46
@Crashworks Sure, but even if they don't believe the question is worthwhile for their field, they don't have to downvote it and hide it from those who do, especially without an explanation of what the actual problem is. I'm fine with being criticized/downvoted, just as long as I understand how I can improve. — Maxpm, Jul 25 '11 at 03:59

score 9 · Accepted Answer · edited May 23 '17 at 11:52

I'm going to answer the question I think you really meant to ask, which is "should push_back() be avoided in the inner loops of heavy algorithms?" rather than what others seem to have read into your post, which is "does it matter if I call push_back before doing an unrelated sort on a large vector?" Also, I'm going to answer from my experience rather than spend time chasing down citations and peer-reviewed articles.

Your example is basically doing two things that add up to the total CPU cost: it's reading and operating on elements in the input vector, and then it has to insert the elements into the output vector. You're concerned about the cost of inserting elements because:

push_back() is constant time (instantaneous, really) when a vector has enough space pre-reserved for an additional element, but slow when you've run out of reserved space.
Allocating memory is costly (malloc() is just slow, even when pedants pretend that new is something different)
Copying a vector's data from one region to another after reallocation is also slow: when push_back() finds it hasn't got enough space, it has to go and allocate a bigger vector, then copy all the elements. (In theory, for vectors that are many OS pages in size, a magic implementation of the STL could use the VMM to move them around in the virtual address space without copying — in practice I've never seen one that could.)
Over-allocating the output vectors causes problems: it causes fragmentation, making future allocations slower; it burns data cache, making everything slower; if persistent, it ties up scarce free memory, leading to disk paging on a PC and a crash on embedded platforms.
Under-allocating the output vectors causes problems because reallocating a vector is an O(n) operation, so reallocating it m times is O(m×n). If the STL's default allocator uses exponential reallocation (making the vector's reserve twice its previous size every time you realloc), that makes your linear algorithm O(n + n log m).

Your instinct, therefore, is correct: always pre-reserve space for your vectors where possible, not because push_back is slow, but because it can trigger a reallocation that is slow. Also, if you look at the implementation of shrink_to_fit, you'll see it also does a copy reallocation, temporarily doubling your memory cost and causing further fragmentation.

Your problem here is that you don't always know exactly how much space you'll need for the output vectors; the usual response is to use a heuristic and maybe a custom allocator. Reserve n/2+k of the input size for each of your output vectors by default, where k is some safety margin. That way you'll usually have enough space for the output, so long as your input is reasonably balanced, and push_back can reallocate in the rare cases where it's not. If you find that push_back's exponential behavior is wasting too much memory ( causing you to reserve 2n elements when really you just needed n+2 ), you can give it a custom allocator that expands the vector size in smaller, linear chunks — but of course that will be much slower in cases where the vectors are really unbalanced and you end up doing lots of resizes.

There's no way to always reserve the exact right amount of space without walking the input elements in advance; but if you know what the balance usually looks like, you can use a heuristic to make a good guess at it for a statistical performance gain over many iterations.

+1 for the shrink_to_fit-does-a-copy. Just looked at the Microsoft implementation and, indeed... — Gabriel, Aug 24 '12 at 19:44

Benjamin Lindley · Answer 2 · 2011-07-24T03:24:30.603

2

You could, of course, allocate the same amount of space as the source vector's size, since neither new vector will need to hold more than that, but that seems somewhat wasteful.

Then follow it up with a call to shrink_to_fit

However, I'm worried that this will be inefficient and harm things like sorting algorithms. ... Is push_back() usually trusted to manage this sort of thing for the programmer, or can it become burdensome for sensitive algorithms?

Yes, push_back is trusted. Although honestly I don't understand what your concern is. Presumably, if you're using algorithms on the vector, you've already put the elements into the vector. What kind of algorithm are you talking about where it would matter how the vector elements got there, be it push_back or something else?

edited Jul 24 '11 at 03:24

answered Jul 24 '11 at 02:28

Benjamin Lindley

101,917
9
204
274

To be clear, the programmer is the person writing the algorithm. He's not calling `std::sort` or anything - the vector-splitting is a part of the algorithm he is writing, and the algorithm needs to be as efficient as possible. – Maxpm Jul 24 '11 at 03:30
@Maxpm: Then, unless you're talking about positively massive* datasets, I would just go with the full-sized reserve, followed by the algorithm you showed in your post, followed by shrink_to_fit. -- I'm defining positively massive here loosely as ~10% of your total system memory. – Benjamin Lindley Jul 24 '11 at 03:36

Kerrek SB · Answer 3 · 2011-07-24T11:07:38.870

How about sorting the original vector with a custom predicate that puts all the evens before all the odds?

bool EvenBeforeOdd(int a, int b)
{
  if ((a - b)  % 2 == 0) return a < b;

  return a % 2 == 0;
}

std::sort(v.begin(), v.end(), EvenBeforeOdd);

Then you just have to find the largest even number, which you can do e.g. with upper_bound for a very large even number or something like that. Once you found that, you can make very cheap copies of the ranges.

Update: As @Blastfurnace commented, it's much more efficient to use std::partition rather than sort, since we don't actually need the elements ordered within each partition:

bool isEven(int a) { return 0 == a % 2; }
std::vector<int>::const_iterator it =  std::partition(v.begin(), v.end(), isEven);

std::vector<int> evens, odds;
evens.reserve(std::distance(v.begin(), it);
odds.reserve(std::distance(it, v.end());

std::copy(v.begin(), it, std::back_inserter(evens));
std::copy(it, v.end(), std::back_inserter(odds));

If he's just copying the elements to two destination vectors then using `std::partition_copy` is simpler and probably much faster. — Blastfurnace, Jul 24 '11 at 06:16
@Blastfurnace: Yes, very nice! Is `partition_copy` new in C++0x? But even `partition` is a much better idea than mine. I'll add that! — Kerrek SB, Jul 24 '11 at 11:02

score 1 · Answer 4 · answered Jul 24 '11 at 02:28

If your objects are created dynamically then the vectors are literally just storing pointers. This makes the vectors considerably more efficient, especially when it comes to internal reallocation. This would also save memory if same objects exist in multiple locations.

std::vector<YourObject*> Evens;

Note: Do not push pointers from context of function as this will cause data corruption outside of that frame. Instead objects would need to be allocated dynamically.

This might not solve your problem, but perhaps it is of use.

This is rarely ever a good idea. The motivation for doing this is usually the result of a severe underestimation of just how smart library writers are. It certainly is not in any way helpful in this case since we are talking about integers. — Benjamin Lindley, Jul 24 '11 at 03:53

score 1 · Answer 5 · answered Jul 24 '11 at 02:30

1

If your sub vectors are exactly half (odd / even) then simply allocate 50% of original vector for each. This would avoid wastage and shrink_to_fit.

answered Jul 24 '11 at 02:30

Lea Hayes

62,536
16
62
111

Being Smart About Vector Memory Allocation

5 Answers5

Linked