Why is vector::iterator invalidated upon reallocation?

Question

I don't understand why a vector's iterator should be invalidated when a reallocation happens.

Couldn't this have been prevented simply by storing an offset -- instead of a pointer -- in the iterator?

Why was vector not designed this way?

Every time you insert anywhere but the end or sort etc you would have to change all the offsets. — John Dibling, Jun 11 '12 at 02:23
"You can" and "it's worth doing" are different things. It's a matter of requirements, and POV. — curiousguy, Aug 11 '12 at 00:46
@curiousguy: If it weren't for performance, it would be certainly worth doing, IMO. But because of the performance issue it isn't. — user541686, Aug 11 '12 at 01:00
@curiousguy: Yup. Although it was surprising that there was one (very common?) scenario where that rule seemed to be forgotten: http://stackoverflow.com/questions/11149665/c-vector-that-doesnt-initialize-its-members — user541686, Aug 11 '12 at 01:26

Tony Delroy · Answer 1 · 2012-06-11T01:21:07.053

15

Because for iterators to do that, they'd need to store a pointer to the vector object. For each data access, they'd need to follow the pointer to the vector, then follow the pointer therein to the current location of the data array, then add the offset * the element size. That'd be much slower, and need more memory for the size_type member.

Certainly, it's a good compromise sometimes and it would be nice to be able to choose it when wanted, but it's slower and bulkier than (C-style) direct array usage. std::vector was ruthlessly scrutinised for performance when the STL was being introduced, and the normal implementation is optimised for space and speed over this convenience/reliability factor, just as the array-equivalent operator[] is as fast as arrays but less safe than at().

edited Jun 11 '12 at 01:21

answered Jun 11 '12 at 01:09

Tony Delroy

102,968
15
177
252

+1 would've liked to accept this one but Nate's answer had a citation! :\ Thanks though! – user541686 Jun 11 '12 at 04:49
1

How many times have I wished that `[]` be the checked access and `at` the unchecked one :( – Matthieu M. Jun 11 '12 at 06:29
@Matthieu: agreed! It can be nice to brag about how blindingly fast your code is, but more often than not if you use `at()` and save the debugging time for profiling, you'll end up faster anyway... :-) – Tony Delroy Jun 11 '12 at 06:36

score 15 · Accepted Answer · answered Jun 11 '12 at 01:53

Just to add a citation to the performance-related justification: when designing C++, Stroustrup thought it was vital that template classes like std::vector approach the performance characteristics of native arrays:

One reason for the emphasis on run-time efficiency...was that I wanted templates to be efficient enough in time and space to be used for low-level types such as arrays and lists.

...

Higher-level alternatives -- say, a range-checked array with a size() operation, a multidimensional array, a vector type with proper numeric vector operations and copy semantics, etc. -- would be accepted by users only if their run-time, space, and notational convenience approached those of built-in arrays.

In other words, the language mechanism supplying parameterized types should be such that a concerned user should be able to afford to eliminate the use of arrays in favor of a standard library class.

Bjarne Stroustrup, Design and Evolution of C++, p.342.

score 5 · Answer 3 · edited Jun 11 '12 at 18:40

5

You can add safety by wrapping the standard std::vector<T>::iterator, but you can't add speed by wrapping a extension::vector<T>::safe_iterator. That's a general principle, and explains many C++ design choices.

edited Jun 11 '12 at 18:40

user541686

205,094
128
528
886

answered Jun 11 '12 at 08:58

MSalters

173,980
10
155
350

score 3 · Answer 4 · answered Jun 11 '12 at 02:38

3

There are many reasons for these decisions. As others pointed out, the most basic implementation of iterator for a vector is a plain pointer to the element. To be able to handle push_back iterators would have to be modified to handle a pointer into the vector and a position, on access through the operator, the vector pointer would ave to be dereferenced, the pointer to the data obtained and the position added, with an extra dereference.

While that would not be the most efficient implementation, that is not really a limiting factor. The default implementation of iterators in VS/Dinkumware libraries (even in release) are checked iterators, that manage an equivalent amount of information.

The actual problem comes with other mutating operations. Consider inserting/erasing in the middle of the vector. To maintain validity of all iterators, the container would have to track all the instances of iterators and adapt the position field so that they still refer to the same element (that has been displaced by the insertion/removal).

answered Jun 11 '12 at 02:38

David Rodríguez - dribeas

204,818
23
294
489

2

"adapt the position field so that they still refer to the same element (that has been displaced by the insertion/removal)." - that's only one notion of "validity" - it's just as likely that the client may want to refer to the Nth element regardless of insertions/removals (accepting responsibility for ensuring there still is an Nth element). Separately, the C++11 Standard 23.3.6.5.1 and 23.4.6.5.3 allows invalidation of iterators/references at or after the point of an insert or erase even with "normal" iterators. – Tony Delroy Jun 11 '12 at 04:00
@TonyDelroy: An iterator refers to an element in a container. If after obtaining an iterator the iterator no longer referred to the same element that would be a breach in the contract. Consider the operators on iterators: `auto it = std::find( v.begin(), v.end(), value ); v.erase( v.begin()+5 );` at this point (assuming your premise) there is no guarantee that `*it == value` for a vector, but if `v` was a list, then it would be guaranteed. Now you have broken the iterator contract into different iterators, with completely different semantics. – David Rodríguez - dribeas Jun 11 '12 at 10:42
@TonyDelroy: Regarding the quotes, yes, no one has said otherwise. *All* iterators into a vector are *invalidated* once the vector is modified, regardless of where the iterator was pointing and what the mutating operation is. Summing it up: you cannot provide consistent iterator semantics (across containers) if you do not *fix* where the iterator points to. You cannot *fix* the iterator without tracking all of them. Without tracking all iterators, *any* mutating operation on a vector has to invalidate all iterators. – David Rodríguez - dribeas Jun 11 '12 at 10:45
@DavidRodríguez-dribeas: good point about consistent semantics (assuming value later in both containers) - it would be confusing, but this second semantic model would almost certainly be adopted if indexed iterators were, and there are "pros". "Without tracking... any mutating operation...invalidate[s] all iterators" - nope, C++11 23.4.1.2.5/...4.3 & thereaboutes - erase invalidates from that point on, clear and reallocation invalidate all iterators, NO other mutators invalidate iterators (e.g. not inserts that don't trigger reallocation, nor earlier-than-erased elements). Similar for C++03. – Tony Delroy Jun 11 '12 at 12:00
@TonyDelroy: True that. I tend to fall on the *safe* side, and make sure that when lying out blank statements I fall on the safe side. I should have been more precise regarding the cases. The standard basically calls iterator invalidation in all cases where a pointer or reference that referred to element A will not longer refer to A after the operation (reallocation, insertion/removal before the point of insertion). – David Rodríguez - dribeas Jun 11 '12 at 12:17
@DavidRodríguez-dribeas: "The standard basically calls..." - a good summation, and highlights the potential efficiency issue or semantic disparity you pointed out.... Hard to get one-size-fits-all, and remarkable how close the Standard containers have come... good 99% of the time. – Tony Delroy Jun 11 '12 at 12:32

score 1 · Answer 5 · answered Jun 11 '12 at 01:08

1

You would need to store both the offset and a pointer to the vector object itself.

As specified, the iterator can just be a pointer, which takes less space.

answered Jun 11 '12 at 01:08

Nemo

70,042
10
116
153

Ben Voigt · Answer 6 · 2016-12-08T17:29:38.810

TL;DR -- because you're trading simple rules for invalidation for far more complicated action-at-a-distance ones.

Please note that "store a pointer to the vector object" would cause new invalidation cases. For example, today swap preserves iterator validity, if a pointer (or reference) to the vector is stored inside iterators, it no longer could. All operations that move the vector metadata itself (vector-of-vectors anyone?) would invalidate iterators.

You trade is "iterator becomes invalid when a pointer/reference to the element is invalidated" for "iterator becomes invalid when a pointer/reference to the vector is invalidated".

The performance arguments don't much matter, because the proposed alternate implementation is not even correct.

Yeah, I didn't really have this question anymore haha. I think I realized this not too long after asking this question. Thanks though! +1 — user541686, Dec 08 '16 at 17:43

score 0 · Answer 7 · answered Jun 11 '12 at 15:01

0

I an iterator wasn't invalidated, should it point to the same element or to the same position after an insertion before it? In other words, even if there were no performance issues, it is non-trivial to decide which alternative definition to use.

answered Jun 11 '12 at 15:01

Meeto

1

The question is about reallocations, not insertions. F.ex. a `push_back` can trigger a reallocation. – curiousguy Aug 11 '12 at 00:47

Why is vector::iterator invalidated upon reallocation?

7 Answers7