54

What are the advantages (if any) of initializing the size of a C++ vector as well as other containers? Is there any reason to not just use the default no-arg constructor?

Basically, are there any significant performance differences between

vector<Entry> phone_book;

and

vector<Entry> phone_book(1000);

These examples come from The C++ Programming Language Third Edition by Bjarne Stroustrup. If these containers should always be initialized with a size, is there a good way to determine what a good size to start off would be?

amhokies
  • 625
  • 1
  • 6
  • 8
  • 1
    They don't need to be initialized with a size but if you know it (more or less) you'll have a great gain in performance. It starts with size N, you add N + 1 elements and it has to allocate a 2N buffer and copy old data. This again for 2N + 1 and so on... – Adriano Repetti Aug 03 '14 at 20:41
  • 1
    Do you need to construct 1000 `Entry` objects or not? If yes, then the second version does that. If not, then you don't need it. – juanchopanza Aug 03 '14 at 20:49
  • @juanchopanza definitely not. It's an example that was put in the book, which is why it confused me as to why he needed to initialize it to 1000. – amhokies Aug 03 '14 at 20:51
  • 1
    Yes sure there are reasons, if you known the initial state of the vector would be the default initialization, it's better this ways, example you need initialize a vector of 1000 counters (initialized to 0), you could use this constructor. – NetVipeC Aug 03 '14 at 20:51

4 Answers4

109

There are a few ways of creating a vector with n elements and I will even show some ways of populating a vector when you don't know the number of elements in advance.

But first

what NOT to do

std::vector<Entry> phone_book;
for (std::size_t i = 0; i < n; ++i)
{
    phone_book[i] = entry; // <-- !! Undefined Behaviour !!
}

The default constructed vector, as in the example above creates an empty vector. Accessing elements outside of the range of the vector is Undefined Behavior. And don't expect to get a nice exception. Undefined behavior means anything can happen: the program might crash or might seem to work or might work in a wonky way. Please note that using reserve doesn't change the actual size of the vector, i.e. you can't access elements outside of the size of the vector, even if you reserved for them.

And now some options analyzed

default ctor + push_back (suboptimal)

std::vector<Entry> phone_book;
for (std::size_t i = 0; i < n; ++i)
{
    phone_book.push_back(entry);
}

This has the disadvantage that reallocations will occur as you push back elements. This means memory allocation, elements move (or copy if they are non-movable, or for pre c++11) and memory deallocation (with object destruction). This will most likely happen more than once for an n decently big. It is worth noting that it is guaranteed "amortized constant" for push_back which means that it won't do reallocations after each push_back. Each reallocation will increase the size geometrically. Further read: std::vector and std::string reallocation strategy

Use this when you don't know the size in advance and you don't even have an estimate for the size.

"count default-inserted instances of T" ctor with later assignments (not recommended)

std::vector<Entry> phone_book(n);
for (auto& elem : phone_book)
{
    elem = entry;
}

This does not incur any reallocation, but all n elements will be initially default constructed, and then copied for each push. This is a big disadvantage and the effect on the performance will most likely be measurable. (this is less noticeable for basic types).

Don't use this as there are better alternatives for pretty much every scenario.

"count copies of elements" ctor (recommended)

std::vector<Entry> phone_book(n, entry);

This is the best method to use. As you provide all the information needed in the constructor, it will make the most efficient allocation + assignment. This has the potential to result in branchless code, with vectorized instructions for assignments if Entry has a trivial copy constructor.

default ctor + reserve + push_back (situational recommended)

vector<Entry> phone_book;
phone_book.reserve(m);

while (some_condition)
{
     phone_book.push_back(entry);
}

// optional
phone_book.shrink_to_fit();

No reallocation will occur and the objects will be constructed only once until you exceed the reserved capacity. A better choice for push_back can be emplace_back.

Use this If you have a rough approximation of the size.

There is no magical formula for the reserve value. Test with different values for your particular scenarios to get the best performance for your application. At the end you can use shrink_to_fit.

default ctor + std::fill_n and std::back_inserter (situational recommended)

#include <algorithm>
#include <iterator>

std::vector<Entry> phone_book;

// at a later time
// phone_book could be non-empty at this time
std::fill_n(std::back_inserter(phone_book), n, entry);

Use this if you need to fill or add elements to the vector after its creation.

default ctor + std::generate_n and std::back_inserter (for different entry objects)

Entry entry_generator();

std::vector<Entry> phone_book;
std::generate_n(std::back_inserter(phone_book), n, [] { return entry_generator(); });

You can use this if every entry is different and obtained from a generator

Intializer list (Bonus)

Since this has become such a big answer, beyond of what the question asked, I will be remised if I didn't mention the initializer list constructor:

std::vector<Entry> phone_book{entry0, entry1, entry2, entry3};

In most scenarios this should be the default go-to constructor when you have a small list of initial values for populating the vector.


Some resources:

std::vector::vector (constructor)

std::vector::insert

standard algorithm library (with std::generate std::generate_n std::fill std::fill_n etc.)

std::back_inserter

bolov
  • 72,283
  • 15
  • 145
  • 224
  • 3
    It is worth mentioning that vector phone_book(n); has different behaviors until c++11 and after c++11. Until c++11, one instance of Entry will be constructed and then copied n times. After c++11, n instances of Entry will be created. This is a crucial piece of information as when using pointer classes, which also initialise the objects (maybe with new), the vector phone_book(n); until c++11 would actually copy the address n times, essentially pointing to the same object. – John Mar 07 '19 at 18:51
14

If you know ahead what the size is, then you should initialize it so that memory is only allocated once. If you only have a rough idea of the size, then instead of allocating the storage as above, you can create the vector with the default constructor and then reserve an amount that is approximately correct; e.g.

vector<Entry> phone_book();
phone_book.reserve(1000);

// add entries dynamically at another point

phone_book.push_back(an_entry);

EDIT:

@juanchopanza makes a good point - if you want to avoid default constructing the objects, then do reserve and use push_back if you have a move constructor or emplace_back to construct directly in place.

sfjac
  • 7,119
  • 5
  • 45
  • 69
  • 1
    If you know the size ahead of time *and* you need 1000 default constructed `Entry` objects. Otherwise, you have to think whether you really want to use the size constructor. – juanchopanza Aug 03 '14 at 20:52
  • 1
    @juanchopanza Note that until C++11, the 1000 entries would actually be copies of one constructed entry. After c++11, 1000 entries will actually be constructed. – John Mar 07 '19 at 20:10
1

You initialize the size when you have a good idea of the number of elements that you need to store in the vector. If you are retrieving data from database or other source for instance that you know has 1000 elements in it then it makes sense to go ahead and allocate the vector with an internal array that will hold that much data. If you don't know up front what the needed size will be then it might be ok to just let the vector grow as needed over time.

The right answer comes down to your application and its specific use case. You can test performance and tweak the sizing as needed. Usually its a good idea to just get things working and then go back and test the affects of these sorts of changes later. A lot of time you'll find that the defaults work just fine.

Tim Bish
  • 17,475
  • 4
  • 32
  • 42
1

It is a bad example of Bjarne Stroustrup. Instead of the second definitiona

vector<Entry> phone_book(1000);

it would be much better to write

vector<Entry> phone_book;
phone_book.reserve( 1000 );

There is no general "good ways" to determine what a good size to start off would be. It depends on the information you possess about the task. But in any case you can use some initial allocation if you are sure that new elements will be added to the vector.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • 1
    Why is the first way bad? – Steve M Aug 03 '14 at 20:46
  • 1
    From what I can tell, the first way constructs 1000 entry objects when the vector is being initialized, while the second just reserves the space for it. – amhokies Aug 03 '14 at 21:00
  • @svenoaks The first way is bad because you construct 1000 objects of type Entry though it is not clear 1) whether you will use all 1000 objects and 2) whether you indeed need to call the default constructor for a given Entry or you need to call a constructor with parameters. I am sure that it would be better to call a constructor with parameters when you have the information about added entry such as name, phone number and so on. – Vlad from Moscow Aug 03 '14 at 21:26
  • The first way is good if you know you need exactly 1000 elements. Often in my code that's exactly what I need: a specific number of objects with default values. – Octo Poulos May 04 '22 at 08:47