1

I want to add many strings to a vector, and from what I've found, calling reserve() before this is more efficient. For a vector of ints, this makes sense because int is 4 bytes, so calling reserve(10) clearly reserves 40 bytes. I know the number of strings, which is about 60000. Should I call vector.reserve(60000)? How would the compiler know the size of my strings, as it doesn't know if these strings are of length 5 or 500?

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
BanjoMan
  • 93
  • 5
  • By saying std::string is fixed at compile time, does that mean that if I have a string of 1000 bytes, that every string has 1000 bytes, even for short strings? – BanjoMan Aug 31 '22 at 17:41
  • A `char *` can also represent a string of any size, and just like `std::string`, a `char *` is always the same size. – Drew Dormann Aug 31 '22 at 17:52

3 Answers3

6

The compiler doesn't know the size of the strings, it knows the size of std::string object. Now, the size of std::string object does not depend on size of string. That is because - most of the time [1] - std::string will allocate on heap, so the object itself is only a pointer and length.

This also means, when you reserve the vector, you don't yet reserve memory for the strings. This is, however, not always a problem. std::strings come from somewhere: if the strings you receive are the return value of a function (i.e., you have them by value), then the memory is already allocated for the string (in the return value). Thus, e.g. std::swap() can help you speeding up populating the array with the results.

If however you populate it using passing references, then the callee will do the operations that result in alloc. In this case, you'd likely want to loop over the vector and reserve each string:

std::vector<std::string> v;
v.reserve(60000); // expected number of strings
for (auto& s : v) {
    s.reserve(500); // expected/max. size of strings
}

[1] It might occur that the specific implementation of std::string actually has a small, fixed-size buffer for sort strings and thus allocates only on heap when the string is longer than that.

lorro
  • 10,687
  • 23
  • 36
  • The short string optimization is not a thing of the past, it is used by most modern implementations. It is copy-on-write that disappeared. – Marc Glisse Aug 31 '22 at 20:28
  • @MarcGlisse Thanks for the explanation, updated the answer. Yes, I remember the upgrades when cow went out of glibc... – lorro Aug 31 '22 at 20:30
0

Roughly speaking, std::string implementation consists of a pointer to the character buffer which represents the string. This character buffer is dynamically allocated on the heap (not always the case, refer to short string optimization). So it really doesn't matter how much space you reserve for the vector, because none of it will be utilized for the character buffer, and for every string that you add in the vector, the character buffer will be dynamically allocated, leaving the extra reserved space unused.

The size of the std::string class is known at compile time, and is equal to sizeof(std::string). In your case, you should just do something of the sort of v.reserve(n * sizeof(std::string)) if you are expecting to insert n strings into the vector v.

FakeMod
  • 125
  • 1
  • 7
0

I want to add many strings to a vector, and from what I've found, calling reserve() before this is more efficient.

If you know up front how many strings you want to store in the vector, then yes.

For a vector of ints, this makes sense because int is 4 bytes, so calling reserve(10) clearly reserves 40 bytes.

Yes, as it is allocating memory for sizeof(int) * 10 bytes.

I know the number of strings, which is about 60000. Should I call vector.reserve(60000)?

Yes.

How would the compiler know the size of my strings, as it doesn't know if these strings are of length 5 or 500?

The compiler doesn't need to know the length of the strings. Obviously, that is not known until runtime. However, that length doesn't change the compile-time size of the std::string class itself, which has a fixed layout and size. But one of its data members is a pointer to the actual character data, which is typically stored elsewhere in dynamic memory, thus is not counted toward the memory of the std::string object itself.

However, in the case of Short-String Optimization, the std::string class includes a small fixed buffer, which does count towards its fixed size at compile-time, and its data pointer will point at that buffer until the character data grows beyond the size of the buffer, then std::string will allocate dynamic memory to hold the larger character data. The SSO buffer still exists in the object, just unused at that point.

reserve() will allocate space only for the std::string objects themselves, not for any dynamic memory used for their character data. When a std::string object points at dynamic memory for its character data, that is irrelevant to the memory that std::vector allocates.

So yes, you would call reserve(60000) if you want to reserve space for 60000 std::string objects. That would allocate memory for sizeof(std::string) * 60000 bytes in the vector.

So, in general, reserve() allocates sizeof(vector::element_type) * capacity number of bytes. Then the vector creates instances of the element_type inside that memory as needed.

Or, in other words, when you want to pre-allocate memory for n number of elements, you ask reserve() to allocate memory for n number of elements. Period. The details of what those elements do internally is irrelevant to the vector. That is for the elements to handle on their own.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770