Inserting into vector by reference to element of same vector

Question

I was wondering if someone more experienced might be able to clarify if this is a buggy operation to do on a vector:

std::vector<int> v{1, 2, 3, 4, 5};
v.insert(v.begin() + 1, v[0]);

The reason I ask is because the element to insert is a reference to the 0th element in the vector. If the insertion forces the vector to resize (because its capacity is full), then the reference to v[0] would be invalidated, and the code might insert an incorrect value. Here is some pseudo-code that might demonstrate:

template <typename T>
void vector_insert_method(Iterator pos, const T& value) {
    if capacity full:
        reallocate array and copy over element
        // if value was reference to elem in this vector,
        // that reference might be invalidated


    insert element

    ++size
}

This issue might be more realistic on a concurrent system.

A similar, and related question is what happens if you try inserting an element that comes after the position you are trying to insert. For example, doing something like v.insert(v.begin(), v[2]) because the standard points out that references to elements after the insertion point are invalidated. Is this guaranteed to work?

I suppose you could `static_cast(v[0])` to copy the element and pass in a const reference to the temporary, if it turns out that this is potential UB. — cdhowie, Apr 15 '17 at 19:36
Feels like a special case of http://stackoverflow.com/questions/17636690/nice-way-to-append-a-vector-to-itself. — Oliver Charlesworth, Apr 15 '17 at 19:37
@OliverCharlesworth IMO It's not a special case of that, because single element `insert` is its own function independent of the range insertion options. — M.M, Apr 16 '17 at 00:03
@cdhowie or the good old `+v[0]`. Although it turns out it's not UB. — M.M, Apr 16 '17 at 00:21
After the edit this question asks a **completely** different question. I've changed my answer to reflect that, though the original question was IMO interesting in it's own. Should this question ask about both situations (with and without the `+ 1`) or is a second question the better choice? — Daniel Jour, Apr 16 '17 at 00:43
@DanielJour I edited the question to reflect my original intent but I agree the issue you have pointed out is equally interesting. I've appended the other situation to the question, because I think the two questions are related. Sorry about that and thanks for an excellent answer! — gowrath, Apr 16 '17 at 01:34
References to all elements are invalidated, there is no special case for elements after the insertion point — M.M, Apr 16 '17 at 02:59
@M.M I wouldn't be so sure about that. My libcpp implementation of `` is definitely doing things to circumvent these issues. — gowrath, Apr 16 '17 at 08:04
@gowrath This code has to work whether or not the element is after the insertion point — M.M, Apr 16 '17 at 10:04
@M.M Can please support that with some source / logical reasoning? I've seen lots of very good answers from you, which is why I'm inclined to believe you, but just stating "This code has to work [..]" isn't really helpful here, is it? — Daniel Jour, Apr 18 '17 at 07:21

Daniel Jour · Accepted Answer · 2017-04-18T07:37:31.257

7

The original question asked ...

[..] whether this is a buggy operation to do on a vector:
v.insert(v.begin(), v[0]);
//              ^^^ no + 1 here!

Probably yes, it could be undefined behavior. Quoting N3337 ("almost C++11" AFAIK):

[..] If no reallocation happens, all the iterators and references before the insertion point remain valid. [..]

§23.3.6.5/1

While this is not the clear wording one is used to when reading the standard, I'd interpret this as: iterators and references to and after the insertion point are invalidated.

Thus, the reference to v[0] is invalidated by the call(*) to insert. Even if no reallocation happens, insert has to shift all elements to higher indices (so v[0] is re-positioned to v[1]).

Assuming the element type is not trivially destructible, then no matter how that re-positioning is done (either via moving or copying), after v[1] has been assigned / constructed from v[0], the destructor of v[0] needs to be called (and its lifetime ended) before a new object (the one you want to insert) can be placed in the memory location of v[0]. Thus, here your reference turns into a dangling reference, which leads to undefined behavior when it's used directly afterwards to construct the new v[0]. No, as far as I can see, this issue could be circumvented by having std::vector::insert() not construct the new object but rather assign the new object to the "old" one. I'm not sure whether there's a requirement for std::vector to behave that way, though its element type having to be CopyInsertable gives a hint that this might be the case.

UPDATE: I've played around with the code provided in the other answer, adding some print debugging. This showed not what I was expecting (destructing one element and then accessing it) but still shows that the standard library implementation (used here by ideone) does not conform to the standard: It invalidates references to elements before the point of insertion even if the capacity is large enough. That way it circumvents above issues (but breaks the standard ... so).

(*): One could argue about when that invalidation happens. The best answer to that question is IMO that the reference is valid right before calling the insert function, and is invalid (for sure) after returning from that function. The exact point of invalidation is unspecified.

The edited question asked the same question but with a extremely important modification:

v.insert(v.begin() + 1, v[0]);
//                ^^^^^

Now this is a completely different situation. The reference to v[0] will not necessarily be invalidated by the call to insert because the point of insertion is "behind" v[0]. The only issue that may arise is if the std::vector has to reallocate its internal buffer, but in that case the following sequence of operations should guarantee correct behavior:

Allocate new memory of larger capacity
Construct new element at it's correct index in the newly allocated memory region.
Copy/Move elements from old (too small) memory to the newly allocated memory region.
Free old memory.

edited Apr 18 '17 at 07:37

answered Apr 15 '17 at 19:51

Daniel Jour

15,896
2
36
63

Is this the case for any element of the vector or is this only for the first element since hes inserting at the beginning? Would `v.insert(v.begin(),v[2]);` be safe? Or would we be faced with the same scenario regardless of what element we are accessing? – TheQAGuy Apr 15 '17 at 20:11
1

The real question, as I read it, is whether UB is triggered at all, not whether the reference remains valid *after* the `insert()` invocation. The answer *might be* no, if the `insert()` implementation copies the object before reallocating, then move-constructs the new vector element from the copy. Of course, that requires the type be both copy- and move-constructible. – cdhowie Apr 15 '17 at 20:24
@JackVanier You're safe if the reference is to an object *before* the point of insertion, like `v.insert(std::next(v.begin()), v[0])`. But of course **only** if the insertion does not cause a reallocation. – Daniel Jour Apr 15 '17 at 20:39
I disagree that the operation is buggy, and I vaguely recall questions (and defect reports) that discuss this. The actual function call satisfies all the requirements, so it's up to `std::vector::insert` to make everything work. If that requires copying the element before reallocation then so be it. It's true that the suggested implementation is too simple to work. – Joshua Green Apr 15 '17 at 22:46
I'll expand my answer later a bit. Regarding "it's up to [insert] to make everything work" ... How would it accomplish this without causing undefined behavior? How could it check whether the to-be-inserted object lies within the range of moved/copied objects? Comparing addresses won't work ... that's UB if the object is not from the vector itself. – Daniel Jour Apr 15 '17 at 22:59
@DanielJour I'm sorry I see what you are saying. I have edited the question to avoid this pitfall. My original intent was to ask about invalidation due to reallocation, not because of the position I am inserting. I have changed the question to insert `v[0]` at `vec.begin() + 1`. Apologies for the confusion. – gowrath Apr 15 '17 at 23:09
@cdhowie the container requirements in the standard specify that the lvalue argument of `vector::insert` be CopyAssignable (which is sufficient for your proposal) – M.M Apr 16 '17 at 00:06
@gowrath I see, though you need to understand that this simple edit **completely** changes the question you're asking! – Daniel Jour Apr 16 '17 at 00:40
References are either invalidated or not depending on whether the vector is reallocated or not; "the point of insertion is behind v[0]" is not really relevant. Both versions of the code (with or without `+1`) have to work and both might fail if the compiler didn't take these cases into account in its implementation of `insert`. – M.M Apr 16 '17 at 01:20
@DanielJour I wonder if the standard guarantees that inserting in the first case will work. It seems like an issue that the library implementers should have considered by copy constructing the element before moving any elements. One thing I didn't get is you said "How could it check whether the to-be-inserted object lies within the range of moved/copied objects? Comparing addresses won't work ... that's UB if the object is not from the vector itself". Could you explain why comparing addresses is UB if the object is not from the vector? – gowrath Apr 16 '17 at 01:41
@M.M The point of insertion is important because when no reallocation happens, then the implementation must keep references to objects before the point of insertion valid; thus cannot reallocate and copy its contents to a new buffer. No *if* it tries to *construct* the newly inserted element at the point of insertion, then it first has to destruct the element which is currently there. But if we referenced this element (or one of the following, because above destruction is preceded with the shift of elements to the right) then this reference just became a invalid reference. – Daniel Jour Apr 18 '17 at 07:50
@gowrath You cannot do something like `if (&v[0] <= &new_item <= &(*v.rbegin()))` because that would be undefined behavior if `new_item` is not within that range. See: http://stackoverflow.com/a/9086675/1116364 It could work though with a linear scan, comparing each element's address to the address of `new_item` ... – Daniel Jour Apr 18 '17 at 07:54
@DanielJour you keep claiming things about "references to objects before the point of insertion" however there is nothing in the standard that distinguishes between whether the object is before or after the point of insertion – M.M Apr 18 '17 at 08:28

gowrath · Answer 2 · 2017-04-16T01:51:14.020

Ok so following a lot of looking around, it seems like the standard library is under obligation to make this work. The idea being that since the standard does not explicitly say this shouldn't work, so it must work.

Andrew Koenig wrote a bit about it here and proposes a solution where new memory is allocated, the elements moved over, and only then will the old memory be deallocated.

There are also other discussion here and here (#526).

As for the second case, it seems like the standard library (clang on MacOS) also accounts for the situation where you try to insert a reference to an element in the vector that is positioned after the point where you are trying to insert. To test this I wrote a wrapper class for ints called Integer that behaves exactly like an int except the destructor sets it's internal value to -5. So we should see a -5 inserted instead of the actual value if the std::vector insert doesn't account for the situation.

#include <vector>
#include <iostream>


using std::cout;    using std::endl;
using std::ostream; using std::vector;


class Integer {

public:
    Integer() {
        x = -1;     // default value
    }

    ~Integer() {
        x = -5;     // destructed value. Not 0 so we can clearly see it
    }

    Integer(int r) {
        x = r;
    }

    Integer(const Integer& other) {
        x = other.x;
    }

    Integer operator=(const Integer& other) {
        x = other.x;
        return *this;
    }

    operator int() const{
        return x;
    }

    friend ostream& operator<<(ostream& os, const Integer& thing) {
        os << thing.x;
        return os;
    }

private:
    int x;
};


ostream& operator<<(ostream& os, const vector<Integer> &v) {
    std::copy(v.begin(), v.end(), std::ostream_iterator<Integer>(os, ", "));
    return os;
}

int main() {
    std::vector<Integer> ret {18, 7, 4, 24,11};
    cout << "Before: " << ret << endl;
    ret.insert(ret.begin(), ret[1]);
    cout << "After: " <<  ret << endl;
    return 0;
}

This gives us the correct output:

Before: 18, 7, 4, 24, 11, 
After: 7, 18, 7, 4, 24, 11,

Your code is extremely helpful, but not as it is: Add `cout` statements to constructors/ assignment operator/ destructor. `reserve` the vector to be large enough such that no reallocation needs to happen. (http://ideone.com/YVwJHF) — Daniel Jour, Apr 18 '17 at 07:32
Note: Setting the member in the destructor could potentially be removed by the optimizer, thus your original test won't reliably work as intended. — Daniel Jour, Apr 18 '17 at 07:42

Inserting into vector by reference to element of same vector

2 Answers2