12

What is the reallocation strategy used for std::string and std::vector in GCC's implementations?

I'm interested in the specific strategy employed: When I append items to a vector (or chars to a string) it may exceed the reserved size and then a reallocation will occur, but what will be the new size as a function of the old one? In case of deleting elements, what is the threshold to actually reallocate and free memory (and again what will be the new size)?

Answers for other compilers will be appreciated as well.

manuell
  • 7,528
  • 5
  • 31
  • 58
Guy
  • 1,984
  • 1
  • 16
  • 19
  • To clarify the question (in light of @Paul's solution), I'm not interested in how to avoid reallocation, but to learn the strategy used by the implementation. – Guy Jan 25 '14 at 15:25

4 Answers4

7

Look at the function _M_check_len in bits/stl_vector.h. It contains:

const size_type __len = size() + std::max(size(), __n);

So the basic strategy when you append n elements is either double the size or increase by n, whichever is largest. pop_back never deallocates. libc++ (LLVM) does exactly the same.

Reading the code is the only way you'll find out the strategies for string, deallocation, etc.

Marc Glisse
  • 7,550
  • 2
  • 30
  • 53
  • 2
    FYI, you can browse the libstdc++ source online at e.g. http://gcc.gnu.org/onlinedocs/gcc-4.8.2/libstdc++/api/a01570_source.html – Nate Kohl Jan 25 '14 at 15:59
  • And the libcxx code at http://llvm.org/viewvc/llvm-project/libcxx/trunk/include/vector – Marc Glisse Jan 25 '14 at 21:12
  • "Reading the code is the only way you'll find out the strategies for string, deallocation, etc" No, reading the documentation tells you this - it may be that the code does something stronger than the standard guarantees, and then future versions or other implementations do something different than you had discovered in the code. – Arthur Tacca May 24 '22 at 15:21
  • @ArthurTacca This question was asking precisely what one specific implementation does, not what the standard guarantees. If you can find the answer in the libstdc++ documentation, that would indeed be nicer than looking at the code. But if it isn't documented, looking at the code still looks like a good option to me. Or are you saying that the OP shouldn't have asked this question? – Marc Glisse May 28 '22 at 20:45
7

Both vector and string guarantee that the complexity of push_back is "amortized constant", meaning that the time it takes to call push_back n times divided by n is bounded by a constant as n gets large. The only way to achieve this is for the reallocation to increase the size geometrically, i.e. make the new capacity some fixed multiple of the old size. That way you will only have very "few" reallocations.

Typical growth factors in implementations are 2 or 1.5, though any number strictly greater than 1 will do.

This doesn't interact with reserve, though. If you call reserve(size() + 1) before each pushback, you may get a reallocation each time.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
4

std::vector has methods size and capacity. It should not be too difficult to write a simple program that determines the way memory allocations are being made. Strategies may change with each implementation, and even from version to version.

One strategy I have seen is to use increasing increments, which is an adaptive strategy: give more food to the hungry to avoid frequent shuffles of the data. But the factor of the increase is open to discussion. A simple duplication may grow too fast.

Later

Curious myself, I wrote that program. Here is the output( g++ 4.3.3):

capacity from 0 to 1 increased by 1 at size 1
capacity from 1 to 2 increased by 1 at size 2
capacity from 2 to 4 increased by 2 at size 3
capacity from 4 to 8 increased by 4 at size 5
capacity from 8 to 16 increased by 8 at size 9
capacity from 16 to 32 increased by 16 at size 17
capacity from 32 to 64 increased by 32 at size 33
capacity from 64 to 128 increased by 64 at size 65
capacity from 128 to 256 increased by 128 at size 129
capacity from 256 to 512 increased by 256 at size 257
capacity from 512 to 1024 increased by 512 at size 513
capacity from 1024 to 2048 increased by 1024 at size 1025
capacity from 2048 to 4096 increased by 2048 at size 2049
capacity from 4096 to 8192 increased by 4096 at size 4097
capacity from 8192 to 16384 increased by 8192 at size 8193
capacity from 16384 to 32768 increased by 16384 at size 16385
capacity from 32768 to 65536 increased by 32768 at size 32769
capacity from 65536 to 131072 increased by 65536 at size 65537
capacity from 131072 to 262144 increased by 131072 at size 131073
capacity from 262144 to 524288 increased by 262144 at size 262145
capacity from 524288 to 1048576 increased by 524288 at size 524289

Using an initial allocation in the constructor results in the same progression, using the initial value rather than 1.

laune
  • 31,114
  • 3
  • 29
  • 42
2

All bets are off on how exactly reserve works. You use it something like this:

std::vector<T> myV;
myV.reserve(<space known to be needed>);

so you know reserve won't have to be called (nor any re-allocation performed) until after that space is exceeded.

Paul Evans
  • 27,315
  • 3
  • 37
  • 54