Is inserting in the end equivalent to std::copy()?

Question

Consider following two ways to append elements into a vector

std::vector<int> vi1(10,42), vi2;

vi2.insert(vi2.end(),vi1.begin(),vi1.end());
<OR>
std::copy(vi1.begin(),vi1.end(),std::back_inserter(vi2));

std::copy version looks cleaner and I don't have to type vi2 twice. But since it is a generic algorithm while insert is a member function, could insert perform better than std::copy or does it do the same thing?

I can benchmark myself but I have to do it for every vector for every template type. Has anyone done already?

score 9 · Accepted Answer · answered Jan 17 '13 at 15:23

9

There are some subtle differences. In the first case (std::vector<>::insert) you are giving a range to the container, so it can calculate the distance and perform a single allocator to grow to the final required size. In the second case (std::copy) that information is not directly present in the interface, and it could potentially cause multiple reallocations of the buffer.

Note that even if multiple reallocations are needed, the amortized cost of insertion must still be constant, so this does not imply an asymptotic cost change, but might matter. Also note that a particularly smart implementation of the library has all the required information to make the second version as efficient by specializing the behavior of std::copy to handle back insert iterators specially (although I have not checked whether any implementation actually does this).

answered Jan 17 '13 at 15:23

David Rodríguez - dribeas

204,818
23
294
489

Do you know any special tricks which would allow a generic implementation of your last suggestion? (The only thing I can think of off hand is making `std::back_iterator` derive from a tagging class, and dispatch off this.) – James Kanze Jan 17 '13 at 15:51
@JamesKanze: You could have a traits type that tells `back_insert_iterator` whether or not its container type advertises an `insert` function that's equivalent to repeated calls to `push_back`. Then a function on `back_insert_iterator` to forward to that function, and have another traits type to tell `copy` that its `OutputIterator` type accepts that call. You could auto-detect the presence of the functions if you call them something beginning with `__`: `vector::__insert_at_end` and `back_insert_iterator::__delegate_copy`. Amounts to the same result as what you said, I think. – Steve Jessop Jan 17 '13 at 16:35
Anyway, my point is that `back_insert_iterator` has to somehow know that its container permits `insert` in place of `push_back`. For user-defined types you can't just make the change willy-nilly, you need *explicit* permission from the container type to do something different from the `push_back` calls mandated by the standard. – Steve Jessop Jan 17 '13 at 16:43

Charles Salvia · Answer 2 · 2013-01-17T17:08:48.217

vector::insert will probably perform better in most cases on most mainstream implementations of the C++ standard library. The reason is that the vector object has internal knowledge of the currently allocated memory buffer, and can pre-allocate enough memory to perform the entire insertion since the number of elements can be computed in advance with random-access iterators. However, std::copy along with std::back_inserter will keep calling vector::push_back, which may trigger multiple allocations.

The GNU implementation of std::vector::insert in libstdc++, for example, pre-allocates a buffer in advance if the iterator category is RandomAccessIterator. With input iterators, vector::insert may be equivalent to std::copy, because you can't determine the number of elements in advance.

score 1 · Answer 3 · answered Jan 17 '13 at 15:24

1

You would think that vector::insert might be able to optimize the case where it's inserting multiple items at once, but it's harder than it looks. What if the iterators are output iterators for example - there's no way of knowing ahead of time how many insertions you'll do. It's likely that the code for insert just does multiple push_backs the same as back_inserter.

answered Jan 17 '13 at 15:24

Mark Ransom

299,747
42
398
622

The standard only allows a single reallocation if the iterators to insert are rendom access iterators, so the implementation of insert cannot be just a sequence of `push_back`. – James Kanze Jan 17 '13 at 15:30
@JamesKanze, I didn't know that - if you can cite the standard and put it into an answer you'd get an upvote from me. – Mark Ransom Jan 17 '13 at 15:32
It's implicit in the complexity requirements of `insert`. C++03 requires linear complexity for all but input iterators; C++11 for all iterators! Presumably, the idea in C++03 was to use `std::distance` first, to deterine the size of the allocation. How this is supposed to work with input iterators in C++11, I don't know. – James Kanze Jan 17 '13 at 15:49
1

@JamesKanze, would amortized linear be good enough to meet the spec? I think multiple `push_back`s would qualify, since buffer allocation is supposed to be amortized constant. – Mark Ransom Jan 17 '13 at 16:02
I don't really know. I do know that for the constructor from iterators, the complexity requirements are more detailed, and restrict the number of reallocations depending on the type of iterator. And of course, the most obvious implementation of the iterator constructor is to construct an empty vector, then call `insert`. – James Kanze Jan 17 '13 at 16:09
@MarkRansom: to be specific, `k` calls to `push_back` on a vector of initial size `n` is O(n + k). This is the same complexity as doing a single re-allocation followed by `k` copies/moves. At root, if calls to `push_back` are "amortized constant time", that means `k` calls to `push_back` are "linear time". You don't need to worry about "amortized linear time". Ultimately I think `insert` and the iterator constructors are both expected to dispatch on the iterator tag, but the standard can only mandate that by restricting the number of re-allocations, time complexity doesn't get there. – Steve Jessop Jan 17 '13 at 16:50
Drat, I should probably have used `m` instead of `k`. For clarity, I'm *not* treating `k` as a constant here. – Steve Jessop Jan 17 '13 at 16:57

Nawaz · Answer 4 · 2013-01-17T15:30:09.100

It is not equivalent to std::copy. It is equivalent to push_back (in some sense).

Yes, std::back_inserter does the same thing, which you use with std::copy which can insert at the front also if you use std:front_inserter (though you cannot use it with std::vector). It can insert at specified iterator also if you use std::inserter instead. So you see, std::copy does thing based on what you pass as third argument.

Now coming back to the essence of the question. I think you should use insert, as it can perform better, because it may discover the number of elements it is going to insert, so it may allocate that much memory at once (if needed). In your case, it is likely to perform better, because v1 is std::vector which means it is easy to compute the number of elements in O(1) time.

You can't use `std::front_inserter` with a `std::vector`. – James Kanze Jan 17 '13 at 15:30 — James Kanze, Jan 17 '13 at 15:30

Is inserting in the end equivalent to std::copy()?

4 Answers4