0

The commonly received wisdom about std::vector is that it doubles in size each time it needs to grow beyond its capacity. Though other commonly used grow factors include 1.5.

The amortised time complexity of O(1) for insert implies a exponential increase as allocating a fixed amount would have a complexity of O(N). See for example:

How does the capacity of std::vector grow automatically? What is the rate?

This is a simplication of course. We can also expect rounding to multiples of the page size at large enough values.

Increasing size by a factor is fine for the general case but what happens in more extreme cases?

Say I reserve more than half the available free ram or I successfully reserve a really huge buffer (>100Gb?) based on an estimate rather than a known size.

If I then need to add one more element than I expected what happens?

Is any implementation of vector clever enough to not try to double the memory in these cases? Is there some limit to the size increase after which doubling stops and it becomes linear or some other strategy is used?

The exact growth algorithm is of course implementation defined.

So I will state that I'm interested in the version of libstdc++ that ships with g++ (any version) on GNU/Linux.

I have not so far been able to decipher the code myself.

Information about other implementations would be interesting too. As would information about non-standard APIs that allow the behaviour to be modified.

See also:


Note I realize that taking the currently available memory into account would be challenging to implement. Realistically there is no safe way to determine the largest free contiguous memory block available See for example https://stackoverflow.com/a/690494/1569204

On the other hand it would not be unreasonable to allow exponential growth up to say 10Gb on a system with 50Gb total RAM but up to say 100Gb on a system with 500Gb available.

It is also noted that relying on implementation defined behaviour would generally be bad form. It would be safer to wrap std::vector and control its allocation behaviour yourself if that is really necessary. For example

  class foo
  {
     std::vector<int> bar;
  public:
     void reserveExtra(const size_t atLeastNMoreElements)
     {
        if ((bar.capacity() - bar.size()) < atLeastNMoreElements)
        {
            bar.reserve(bar.size()+atLeastNMoreElements);
        }
     }

     void wrapPush(int x)
     {
        if (bar.size() == bar.capacity()) // slight overhead
        {
           // your re-allocation strategy here
        }
        else
        {
           bar.push_bach(x); //use default behaviour
        }
     }
  };

and of course consider using deque instead. See for example - http://www.gotw.ca/gotw/054.htm

Bruce Adams
  • 4,953
  • 4
  • 48
  • 111
  • 2
    "The commonly received wisdom about std::vector is that it doubles in size each time it needs to grow beyond its capacity" where did you take this from? I don't think this is "common wisdom". The growth factor must be such that complexity is amortized constant, and I know that at least one of the major three uses 3 as factor. – 463035818_is_not_an_ai May 09 '22 at 09:42
  • @463035818_is_not_a_number What implementation uses 3? libstdc++ and libc++ seem to use 2, and MSVC's lib uses 1.5. – HolyBlackCat May 09 '22 at 09:44
  • No, it's probably not smart enough to take available memory int oaccount. – user253751 May 09 '22 at 09:54
  • "Realistically there is no safe way to determine the largest free contiguous memory block available" - well, so you answered your question? – pptaszni May 09 '22 at 09:57
  • @pptaszni: No, that's asking the same question from a different perspective. `std::vector<>` (or more accurately `std::allocator` is part of the implementation, and therefore can rely on implementation details. – MSalters May 09 '22 at 10:19
  • 1
    Note that Linux uses "Memory Overcommit" (https://unix.stackexchange.com/questions/441364/what-is-the-purpose-of-memory-overcommitment-on-linux) by default. So even if std::vector were to reserve 100 GB of memory, the kernel would not actually allocate that much space in physical RAM or in swap file. Only when you tried to write to these locations in the vector would the space actually be allocated in some actual RAM or swap area (and IIUC this would still happen on a per-page basis, rather than requiring all 100 GB available at once). – oliver May 09 '22 at 11:17
  • @Oliver that is a fair point. So in effect the next push after the reserve will (assuming it happens to match a page boundary) load just a single page. So the implementation does not need to do anything special in that case. When the memory is finally exhausted can will the resulting 'page fault', if you can still call it, become a std::bad_alloc if though there was no actual allocation at that point? – Bruce Adams May 09 '22 at 11:46
  • Reading up on overcommit it seems much worse. A non-random program not necessarily yours will be killed by the OOM-killer on the page-fault. Ouch! I didn't think overcommit worked that way - that's just evil - so I must read-on. – Bruce Adams May 09 '22 at 11:58
  • @BruceAdams Overcommitting is an interesting trade-off. It's like the kernel were saying "Nah, you won't need all of this anyway." If the wager is successful everybody is better off; if not, programs run for a while under false premises and fail late and catastrophically, instead of early and predictably. – Peter - Reinstate Monica May 09 '22 at 12:17

0 Answers0