11

Inspired by this question, asking how to append a vector to itself, my first thought was the following (and yes, I realize insert is a better option now):

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

int main() {
    std::vector<int> vec {1, 2, 3};
    std::copy (std::begin (vec), std::end (vec), std::back_inserter (vec));

    for (const auto &v : vec)
        std::cout << v << ' ';
}

However, this prints:

1 2 3 1 * 3

The * is a different number every time the program is run. The fact that it's only the 2 being replaced is peculiar, and if there actually is an explanation for that, I'd be interested to hear it. Continuing, if I append to a different vector (a copy of the original), it outputs correctly. It also outputs correctly if I add the following line before the copy one:

vec.reserve (2 * vec.size());

I was under the impression std::back_inserter was a safe way to add elements onto the end of a container, despite not reserving memory beforehand. If my understanding is correct, what's wrong with the copying line?

I assume it's nothing to do with the compiler, but I'm using GCC 4.7.1.

Community
  • 1
  • 1
chris
  • 60,560
  • 13
  • 143
  • 205

1 Answers1

12

std::back_inserter creates an inserting iterator that inserts elements into a container. Each time this iterator is dereferenced, it calls push_back on the container to append a new element to the container.

For a std::vector container, a call to push_back where v.size() == v.capacity() will result in a reallocation: a new array is created to store the contents of the vector, its current contents are copied into the new array, and the old array is destroyed. Any iterators into the vector at this time are invalidated, meaning they can no longer be used.

In your program, this includes the input range defined by begin(vec) and end(vec) from which the copy algorithm is copying. The algorithm continues to use these iterators, even though they are invalidated, thus your program exhibits undefined behavior.


Even if your container had sufficient capacity, its behavior would still be undefined: the specification states that, upon insertion, "if no reallocation happens, all the iterators and references before the insertion point remain valid" (C++11 §23.3.6.5/1).

The call to push_back is equivalent to insertion at the end, so the end iterator (std::end(vec)) that you passed into std::copy is invalidated after a single call to push_back. If the input range is nonempty, the program therefore exhibits undefined behavior.


Note that the behavior of your program would be well-defined if you used a std::deque<int> or a std::list<int>, because neither of those containers invalidates iterators when elements are appended.

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • It is an interesting issue, BTW. "Iterator" is a more abstract concept than "pointer", so it is expected that end iterator might get invalidated by insertion even without reallocation. But what if I used pointers `&vec.front()` and `&vec.back() + 1` as the input range (assuming non-empty `vec`)? The standard does not guarantee the persistent validity of pre-computed `&vec.back() + 1` pointer, since it is a "reference" to or past the insertion point. But we know that this is a purely conceptual invalidation. In practical sense it should (and will) remain valid. – AnT stands with Russia Jul 16 '12 at 21:00
  • `std::copy_n(std::begin(vec), vec.size(), std::back_inserter(vec));` has got well defined behaviour, right? It doesn't rely on end iterator as range limit. – jrok Jul 16 '12 at 21:09
  • 1
    @jrok, Interesting question; it could be worth asking. "This question, which was inspired by this other question, which was inspired by yet another question..." – chris Jul 16 '12 at 21:20
  • @jrok: Yes, that has well-defined behavior. @AndryT has an interesting point, too: the restriction I quoted only names iterators and references, not pointers. This is curious, and in my opinion is probably a defect in the specification, but if it is correct, then `copy(&vec.front(), &vec.back() + 1, back_inserter(vec));` would also work, assuming sufficient capacity. – James McNellis Jul 16 '12 at 22:03
  • @AndreyT: I agree, it's a "conceptual" invalidation, and I would expect the code to work in practice. But, I could imagine an overly pedantic debug iterator implementation checking for this invalidation (though, the Visual C++ iterator debugging does not complain). I think that the omission of "pointer" from the list of invalidated things is an error of omission: I can't imagine why a reference would be invalidated but not a pointer. – James McNellis Jul 16 '12 at 22:05
  • @James McNellis: I presumed that the term "references" in this case is not restricted to mean C++ references only, but rather used in a more broad sense: all "pointer-like" access paths to vector elements, i.e. includes pointers as well. I could be wrong though. – AnT stands with Russia Jul 16 '12 at 22:23
  • @JamesMcNellis, It just so happened that I got to use jrok's gem of code in another answer and it turns out that the standard says it does `*(first + i)` every iteration, so it will use the invalidated iterator. Just thought I'd let you know in case you remember it and go to use it. – chris Jul 14 '13 at 05:22
  • @jrok, See above comment. – chris Jul 14 '13 at 05:25
  • @chris if with "gem" you mean the code in my previous comment - yes, it assumes capacity to be at least twice the size, otherwise it indeed uses an invalidated iterator. Care to link me to the other answer you mentioned? – jrok Jul 14 '13 at 07:50
  • @jrok, Oh, then I took it the wrong way. Here you go: http://stackoverflow.com/questions/17636690/nice-way-to-append-a-vector-to-itself. Ben's answer is basically that, then. – chris Jul 14 '13 at 07:54