Is it legal to add elements to a preallocated vector in a range-based for loop over that vector?

Question

I'm using Visual Studio 2015 Update 1 C++ compiler and this code snippet:

#include <iostream>
#include <vector>

using namespace std;

int main()
{
  vector<int> v{3, 1, 4};

  v.reserve(6);

  for (auto e: v)
    v.push_back(e*e);

  for (auto e: v)
    cout << e << " ";

  return 0;
}

Release version runs fine, but debug version produces vector iterators incompatible error message. Why is that?

Before you flag it as a duplicate question to Add elements to a vector during range-based loop c++11, please read my answer https://stackoverflow.com/a/35467831/219153 with arguments to the contrary.

It's undefined behaviour, it may work and it may not. In the debug build it doesn't work. — Jonathan Potter, Feb 17 '16 at 21:41
Regarding the title, it could apply to such a broad range of questions. You should consider figuring out a title that closely matches the question content. For example, perhaps something like "Is it legal to add elements to a preallocated vector in a range-based for loop over that vector?" I'll leave it up to you to pick what you find most reasonable. — chris, Feb 17 '16 at 21:52
@chris You are right about the title and I edited it. The reason I didn't go with a similar title in the first place is purely political. I found SO editors to be too trigger happy in flagging questions as duplicate and disregard any future arguments to the contrary. I wanted to circumvent it. — Paul Jurczak, Feb 17 '16 at 22:03
@PaulJurczak, I can see where you're coming from. It can sometimes be hard to unclose a question even when it shouldn't have been closed. While these newer changes to allow accounts with a gold tag badge to instadupe something can hurt if one person jumps the gun, they also allow a one-vote reopen for dupe closes. That should be helping with duplicates specifically. — chris, Feb 17 '16 at 22:48

score 19 · Answer 1 · edited Feb 10 '22 at 21:27

19

According to documentation:

If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.

It says even if capacity is enough past-the-end iterator is invalidated, so I believe your code has undefined behavior (unless this documentation is incorrect and standard says otherwise)

edited Feb 10 '22 at 21:27

Mariano Desanze

7,847
7
46
67

answered Feb 17 '16 at 21:40

Slava

43,454
1
47
90

The documentation is correct. One can not range-loop over vector with invalid iterators. – SergeyA Feb 17 '16 at 21:43
I did not mean that standard may say that it is OK to range-loop over invalid iterators, I mean that past-the-end iterator is invalidated is correct statement. – Slava Feb 17 '16 at 21:45
Its not just that. The ranged based for loop according to the standard gets the end iterator before the loop starts. Once you add an element into the vector the end iterator has changed and now you have a iterator mismatch. – NathanOliver Feb 17 '16 at 21:48

score 17 · Accepted Answer · answered Feb 17 '16 at 21:48

17

Your code exhibits undefined behavior, but it is tricky and tends to only be caught in debug builds.

When you do a v.push_back, all iterators are invalidated if size passes capacity. You avoid this with a reserve.

However, even if you don't cause the capacity to grow, the past-the-end iterator is still invalidated. In general, iterator invalidation rules do not distinguish between "the iterator's 'value' would be garbage/refer to a different object" and "the iterator's 'location' is no longer valid". When either occurs, the iterator is simply deemed invalid. As the end iterator is no longer the end iterator (it goes from referring to nothing, to referring to something, in almost every implementation), the standard simply states it is invalidated.

This code:

for (auto e: v)
  v.push_back(e*e);

expands to roughly:

{
  auto && __range = v; 
  for (auto __begin = v.begin(),
    __end = v.end(); 
    __begin != __end;
    ++__begin
  )
  { 
       auto e = *__begin;
       v.push_back(e*e);
  }
}

the v.push_back call invalidates the __end iterator, which is then compared against, and the debug build correctly flags the undefined behavior as a problem. Debug MSVC iterators are pretty careful with invalidation rules.

The release build does undefined behavior, and as the vector iterator is basically a thin wrapper around a pointer, and the pointer to the past-the-end element becomes the pointer to the last element after a push back without a capacity overflow, it "works".

answered Feb 17 '16 at 21:48

Yakk - Adam Nevraumont

262,606
27
330
524

2

Is there an advantage in making this an undefined behavior? It forces me to rewrite this loop using pointers or indexes instead of using a higher level construct. The end result is the same code compiler would generate from my example, but language lawyers would deem illegal. Is this a defect in STL design? – Paul Jurczak Feb 17 '16 at 23:37
@PaulJurczak To do what you want, they would have to add a half-invalidation effect on iterators, and specify exactly what it means and when it happens. That would be an iterator that remains valid, but now refers to a different element. With that concept, you could easily express that the end iterator becomes an iterator to the element after the old last element when elements are appended and capacity is not increased. This would make the standard more complex, at the very least. You might also end up wanting to do the same when someone inserts into the middle of a vector. Good idea? Dunno. – Yakk - Adam Nevraumont Feb 18 '16 at 00:09
2

Trying to peer into the spec writer's minds, an implementer might want to have a past-the-end iterator whose value is a sentinel, rather than just a pointer to the next spot in memory. If they specified that all previous past-the-end iterators pointed to the newly `push_back`ed value, such an implementation would be impossible. Why, and is it worth it? I don't know. Perhaps a debugging feature might find it valuable for iterators to be marked as past-the-end. The spec writers tend to err on the side of Undefined Behavior, and define it later if need be. – Cort Ammon Feb 18 '16 at 01:52
@CortAmmon _past-the-end iterator whose value is a sentinel, rather than just a pointer to the next spot in memory_ Maybe, but this would violate the "zero cost abstraction" rule, which is often advertised with C++. Incrementing a pointer is the most efficient way to visit consecutive vector elements. This necessitates past-the-end iterator to point to one element past the last one. It is what any compiler with speed optimization enabled will reduce the iterator to. – Paul Jurczak Feb 18 '16 at 05:01
@PaulJurczak Its a rule of thumb, perhaps. A particular case where I could see such sentinels being used is in a debug build, similar to the Visual Studio runtime iterator checking. They don't use sentinels, of course, but it could be useful. More importantly, if they forbid it in the spec, nobody could ever do it that way. – Cort Ammon Feb 18 '16 at 05:39

Is it legal to add elements to a preallocated vector in a range-based for loop over that vector?

2 Answers2

Linked