25

I'm looking at the implementation of std::vector in libc++ and I noticed that it internally keeps three pointers (one to the begin, one the end, and one to the end of the allocated memory) instead of what I'd instinctively do, i.e., one pointer to the begin and two size and capacity members.

Here is the code from libc++'s <vector> (ignore the compressed pair, I know what it means).

pointer                                    __begin_;
pointer                                    __end_;
__compressed_pair<pointer, allocator_type> __end_cap_;

I noticed that also other standard libraries do the same (e.g. Visual C++). I don't see any particular reason why this solution should be faster than the other one, but I might be wrong.

So is there a particular reason the "three pointers" solution is preferred to the "pointer + sizes" one?

gigabytes
  • 3,104
  • 19
  • 35
  • I assume it's more convenient for implementers. – milleniumbug May 24 '15 at 10:00
  • Maybe it's better to start by justifying why you would prefer your method? – tenfour May 24 '15 at 10:02
  • Yes, in that case I'm asking why (e.g. Is the code for function xxxx simpler?) – gigabytes May 24 '15 at 10:02
  • @tenfour I don't prefer my method. I just noticed that all the implementations I saw use the same, so there must be a reason. – gigabytes May 24 '15 at 10:03
  • It helps resist the temptation to optimize ("nobody will ever need a vector of more than 4 billion elements"). – Marc Glisse May 24 '15 at 10:19
  • More interesting imho is why all implementations store the size/capacity in the vector object and none hide it in a header of the allocated buffer (what libstdc++ used to do for basic_string), which makes a significant difference when you have many empty vectors (happens with trees). – Marc Glisse May 24 '15 at 10:23
  • @marc that makes `size` require an indirection. – Yakk - Adam Nevraumont May 24 '15 at 12:51
  • @Yakk and `end` and many things, but how often is that critical? If you are going to look at the first element of the vector, the cost of this indirection already disappears. – Marc Glisse May 24 '15 at 13:50
  • @marc indirection costs more than just a cache miss. Proving that nobody else has access to something you have a pointer to is harder than proving that nobody else has access to something you have a copy of. If you can prove that nobody else modified `size`, you can cache its value between calls: if you cannot prove that, you have to repeatedly go to memory and get its value. If you are passed a pointer-based `std::vector`, the compiler would have to prove that no defined operation can result in someone else having a pointer to that size/capacity. This isn't definitive, but is a concern. – Yakk - Adam Nevraumont May 24 '15 at 14:48

3 Answers3

31

It's because the rationale is that performance should be optimized for iterators, not indices.
(In other words, performance should be optimized for begin()/end(), not size()/operator[].)
Why? Because iterators are generalized pointers, and thus C++ encourages their use, and in return ensures that their performance matches those of raw pointers when the two are equivalent.

To see why it's a performance issue, notice that the typical for loop is as follows:

for (It i = items.begin(); i != items.end(); ++i)
    ...

Except in the most trivial cases, if we kept track of sizes instead of pointers, what would happen is that the comparison i != items.end() would turn into i != items.begin() + items.size(), taking more instructions than you'd expect. (The optimizer generally has a hard time factoring out the code in many cases.) This slows things down dramatically in a tight loop, and hence this design is avoided.

(I've verified this is a performance problem when trying to write my own replacement for std::vector.)


Edit: As Yakk pointed out in the comments, using indices instead of pointers can also result in the generation of a multiplication instruction when the element sizes aren't powers of 2, which is pretty expensive and noticeable in a tight loop. I didn't think of this when writing this answer, but it's a phenomenon that's bitten me before (e.g. see here)... bottom line is, in a tight loop everything matters.

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • I've accepted this answer just because you hinted at an experimental evidence that the performance difference is noticeable – gigabytes May 24 '15 at 10:29
  • 1
    You've explained two pointers, but the third doesn't correspond to any iterator. Why should `capacity()` perform division instead of returning a member? – Potatoswatter May 24 '15 at 10:31
  • 1
    @potato that requires a multiply on `push_back`. `capacity` is a rare thing to call, and otherwise the vector just needs to know if size is going to be more than capacity, not their actual values. – Yakk - Adam Nevraumont May 24 '15 at 12:53
  • Yes… and that should be in the answer. – Potatoswatter May 24 '15 at 13:04
  • @Yakk: Fantastic point about the multiplication, I totally forgot about that even though I'd run across it as well. I edited the answer to include it, thanks! – user541686 May 24 '15 at 17:25
  • 1
    Well, many modern CPUs have [fused multiply-add (FMA)](https://en.wikipedia.org/wiki/FMA_instruction_set) instructions. And indeed, often you multiply by a power of two, or something else which is close enough to allow for two adds and a shift. And - you can issue MULTs as fast as you can ADDs, even faster. So... it's not that bad really. – einpoklum Dec 01 '15 at 22:52
4

It's more convenient for implementers.

Storing size makes exactly one operation easier to implement: size()

size_t size() { return size_; }

on the other hand, it makes other harder to write and makes reusing code harder:

iterator end() { return iterator(end_); } // range version
iterator end() { return iterator(begin_ + size_); } // pointer + size version

void push_back(const T& v) // range version
{
    // assume only the case where there is enough capacity
    ::new(static_cast<void*>(end_)) T(v);
    ++end_;
}

void push_back(const T& v) // pointer + size version
{
    // assume only the case where there is enough capacity
    ::new(static_cast<void*>(begin_ + size_)) T(v);
    // it could use some internal `get_end` function, but the point stil stands:
    // we need to get to the end
    ++size_;
}

If we have to find the end anyway, we could store it directly - it's more useful than size anyway.

milleniumbug
  • 15,379
  • 3
  • 47
  • 71
  • @gigabytes Ok, removed the UB remark. The answer still stands though - there are more operations that need to get to the end than require knowing the size. – milleniumbug May 24 '15 at 10:48
  • The `std::vector` internals are exempt from UB restrictions anyway, as it's part of the implementation. – MSalters May 24 '15 at 14:30
0

I would imagine it's primarily a speed thing. When iterating over the set, the generated instructions for bounds checking would simply be a compare statement with the end pointer (and maybe a load), rather than a load, an add, and a compare (and maybe another load, too).

When generating the iterators for end() and begin(), the code would also just be return pointer;, rather than return pointer + offset; for end().

These are very minor optimizations, but the standard template library is intended to be used in production code where every cycle counts.

PS: In regards to the different compilers implementing it the same way: There is a reference implementation that most (all?) of the compiler vendors base their STL implementations on. It is likely that this particular design decision is a part of the reference implementation, and is why all the implementations you looked at handle vectors this way.

Kaslai
  • 2,445
  • 17
  • 17
  • What about `uintptr_t` for the size? – Daniel May 24 '15 at 10:19
  • The STL predates the C++11 standard by many years. While that might make my contrived example invalid, it was not an option when the initial drafts of the STL were being written. EDIT: irrelevant, it was actually added in C99 it seems. My point still stands though, as uintptr_t is not required to be defined by the standard. – Kaslai May 24 '15 at 10:21
  • While I agree with the first part of your answer, the second part seems pretty artificial: Even in pre c++11 code, `vector::size()` didn't return an int and `vector::at()` didn't use `int` as an arrument, but alsways `vector::size_type` (usually std:size_t). As vector can only use a contigous chunk of memory anyway, that type will - by definition - impose no limitations on the amount of the memory that can be managed by a vector, which aren't already baked into the memory system itself. – MikeMB May 24 '15 at 10:39
  • I agree, it's a silly and contrived example that only a sleep-deprived ape would conjure up. I'll just go ahead and remove it. – Kaslai May 24 '15 at 10:41