23

When using push_back of std::vector, I can push an element of the vector itself without fear of invalidating the argument due to reallocation:

std::vector<std::string> v = { "a", "b" };
v.push_back(v[0]); // This is ok even if v.capacity() == 2 before this call.

However, when using emplace_back, std::vector forwards the argument to the constructor of std::string so that copy construction happens in place in the vector. This makes me suspect that reallocation of the vector happens before the new string is copy constructed (otherwise it would not be allocated in place), thus invalidating the argument before use.

Does this mean that I cannot add an element of the vector itself with emplace_back, or do we have some kind of guarantee in case of reallocation, similar to push_back?

In code:

std::vector<std::string> v = { "a", "b" };
v.emplace_back(v[0]); // Is this valid, even if v.capacity() == 2 before this call?
rasmus
  • 3,136
  • 17
  • 22
  • What is "fear of invalidating the argument due to reallocation"? – slyx Jul 23 '14 at 11:10
  • @xylosper If you are at the limit of a vectors capacity, push_back/emplace_back will reallocate the vector to increase its size. If reallocation happens, references to elements and iterators for this vector are invalidated. In that case the argument to push_back could be invalidated before it is used. However, it turns out that the std::vector jumps through hoops so that this isn't a problem for push_back. – rasmus Jul 23 '14 at 11:13
  • I think in theory it is guaranteed, but in practice you might run across it as a bug, so you should try to avoid it if possible. – user541686 Jul 23 '14 at 11:17
  • @Mehrdad For this question I assume that my C++ implementation is standard compliant. Do I have the guarantee in this case? – rasmus Jul 23 '14 at 11:19
  • 2
    @rasmus: I think the same exact reasoning as the [`push_back`](http://stackoverflow.com/a/18794634/541686) case applies: there is nothing restricting the argument, hence it may very well be from the same vector – user541686 Jul 23 '14 at 11:23
  • *"reallocation of the vector happens before the new string is copy constructed (otherwise it would not be allocated in place)"* Reallocation works this way: allocate a new *buffer*, copy/move the elements, destroy the old elements, destroy the old buffer. You need to keep the old buffer until you have copied/moved the last element. You can just destroy it *after* emplaceing, or emplace the element first before copying/moving the old elements. – dyp Jul 23 '14 at 11:25
  • 1
    This question gets more interesting with `emplace`: http://cplusplus.github.io/LWG/lwg-active.html#2164 – Howard Hinnant Jul 23 '14 at 11:26
  • @dyp: You **must** emplace the element before moving the rest of the elements; otherwise, the other element is moved and no longer there. – user541686 Jul 23 '14 at 11:26
  • @Mehrdad Oh, right. (It's still there, but has an unspecified value.) – dyp Jul 23 '14 at 11:28

3 Answers3

21

emplace_back is required to be safe for the same reason push_back is required to be safe; invalidation of pointers and references only has effect once the modifying method call returns.

In practice, this means that emplace_back performing a reallocation is required to proceed in the following order (ignoring error handling):

  1. Allocate new capacity
  2. Emplace-construct new element at the end of the new data segment
  3. Move-construct existing elements into new data segment
  4. Destruct and deallocate old data segment

At this reddit thread STL acknowledges failure of VC11 to support v.emplace_back(v[0]) as a bug, so you should definitely check whether your library supports this usage and not take it for granted.

Note that some forms of self-insertion are specifically prohibited by the Standard; for example in [sequence.reqmts] paragraph 4 Table 100 a.insert(p,i,j) has the prerequisite "i and j are not iterators into a".

Community
  • 1
  • 1
ecatmur
  • 152,476
  • 27
  • 293
  • 366
2

Contrary to what a few other people have written here, I made the experience this week that this is not safe, at least when trying to have portable code with defined behavior.

Here is some example code that may expose undefined behavior:

std::vector<uint32_t> v;
v.push_back(0);
// some more push backs into v followed but are not shown here...

v.emplace_back(v.back()); // problem is here!

The above code ran on Linux with a g++ STL without problems.

When running the same code on Windows (compiled with Visual Studio 2013 Update5), the vector sometimes contained some garbled elements (seemingly random values).

The reason is that the reference returned by v.back() was invalidated due to the container reaching its capacity limit inside v.emplace_back(), before the element was added at the end.

I looked into VC++'s STL implementation of emplace_back() and it seemed to allocate new storage, copy over the existing vector elements into the new storage location, free the old storage and then construct the element at the end of the new storage. At that point, the referenced element's underlying memory may have been freed already or otherwise invalidated. That was producing undefined behavior, causing the vector elements inserted at reallocation thresholds to be garbled.

This seems to be a (still unfixed) bug in Visual Studio. With other STL implementations I tried, the problem did not occur.

In the end, you should avoid passing a reference to a vector element to the same vector's emplace_back() for now, at least if your code gets compiled with Visual Studio and is supposed to work.

stj
  • 9,037
  • 19
  • 33
1

I checked my vector implementation and it works here as follows:

  1. Allocate new memory
  2. Emplace object
  3. Dealloc old memory

So everything is fine here. A similar implementation is used for push_back so this one is fine two.

FYI, here is the relevant part of the implementation. I have added comments:

template<typename _Tp, typename _Alloc>
    template<typename... _Args>
      void
      vector<_Tp, _Alloc>::
      _M_emplace_back_aux(_Args&&... __args)
      {
    const size_type __len =
      _M_check_len(size_type(1), "vector::_M_emplace_back_aux");
// HERE WE DO THE ALLOCATION
    pointer __new_start(this->_M_allocate(__len));
    pointer __new_finish(__new_start);
    __try
      {
// HERE WE EMPLACE THE ELEMENT
        _Alloc_traits::construct(this->_M_impl, __new_start + size(),
                     std::forward<_Args>(__args)...);
        __new_finish = 0;

        __new_finish
          = std::__uninitialized_move_if_noexcept_a
          (this->_M_impl._M_start, this->_M_impl._M_finish,
           __new_start, _M_get_Tp_allocator());

        ++__new_finish;
      }
    __catch(...)
      {
        if (!__new_finish)
          _Alloc_traits::destroy(this->_M_impl, __new_start + size());
        else
          std::_Destroy(__new_start, __new_finish, _M_get_Tp_allocator());
        _M_deallocate(__new_start, __len);
        __throw_exception_again;
      }
    std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
              _M_get_Tp_allocator());
// HERE WE DESTROY THE OLD MEMORY
    _M_deallocate(this->_M_impl._M_start,
              this->_M_impl._M_end_of_storage
              - this->_M_impl._M_start);
    this->_M_impl._M_start = __new_start;
    this->_M_impl._M_finish = __new_finish;
    this->_M_impl._M_end_of_storage = __new_start + __len;
      }
gexicide
  • 38,535
  • 21
  • 92
  • 152
  • 2
    Doing push_back on an element of the vector itself is perfectly safe. See for example [this link](http://stackoverflow.com/questions/18788780/is-it-safe-to-push-back-an-element-from-the-same-vector). – rasmus Jul 23 '14 at 11:17