What is the most efficient way to append one std::vector to the end of another?

Question

Let v1 be the target vector, v2 needs to be appended to the back of it.

I'm now doing:

v1.reserve(v1.size() + v2.size()); 
copy(v2.begin(), v2.end(), back_inserter(v1));

Is this the most efficient way? Or can it maybe be done just via copying a chunk of memory? Thanks!

possible duplicate http://stackoverflow.com/questions/201718/how-to-concat-two-stl-vectors — Serge, Feb 05 '10 at 15:39
The answers there are actually correct for this question, but not for that question (!) — MSalters, Feb 05 '10 at 15:51

score 83 · Accepted Answer · edited Apr 19 '12 at 14:25

83

After a lot of arguing (and a reasonable comment from Matthieu M. and villintehaspam), I'll change my suggestion to

v1.insert( v1.end(), v2.begin(), v2.end() );

I'll keep the former suggestion here:

v1.reserve( v1.size() + v2.size() ); 
v1.insert( v1.end(), v2.begin(), v2.end() );

There are some reasons to do it the latter way, although none of them enough strong:

there is no guarantee on to what size will the vector be reallocated -- e.g. if the sum size is 1025, it may get reallocated to 2048 -- dependant on implementation. There is no such guarantee for reserve either, but for a specific implementation it might be true. If hunting for a bottleneck it might be rasonable to check that.
reserve states our intentions clear -- optimization may be more efficient in this case (reserve could prepare the cache in some top-notch implementation).
also, with reserve we have a C++ Standard guarantee that there will be only a single reallocation, while insert might be implemented inefficiently and do several reallocations (also something to test with a particular implementation).

edited Apr 19 '12 at 14:25

mloskot

37,086
11
109
136

answered Feb 05 '10 at 15:39

Kornel Kisielewicz

55,802
15
111
149

7

The reserve is most likely unnecessary, since that will probably be done automatically by the insert function. – villintehaspam Feb 05 '10 at 15:41
There is no guarantee that reserve will allocate exactly the amount of storage you request. The only guarantee is that capacity() will be >= to what you request. – villintehaspam Feb 05 '10 at 15:58
10

@villintehaspam - theres also **no** guarantee in the standard that insert will not do several reallocations instead of one. There **is** a guarantee on reserve however: `It is guaranteed that no reallocation takes place during insertions that happen after a call to reserve() until the time when an insertion would make the size of the vector greater than the size specified in the most recent call to reserve().` Hence, reserve is more safer. – Kornel Kisielewicz Feb 05 '10 at 16:09
1

@Kornel Kisielewicz: Actually I believe there more or less is such a guarantee (for normal iterators) in item 23.2.4.4, where the complexity is specified to be linear in the number of elements in the range [first,last) plus the distance to the end of the vector. Multiple reallocations wouldn't fit the bill, unless I am mistaken? – villintehaspam Feb 05 '10 at 16:18
@villintehaspam: strong point -- and I can put my weapons down in this argument, but not an ultimate one -- you assume that a reallocation takes O(n) time what might not be true in case if a vector would be implemented similarily to a "rope" -- hence fragmented. Not that I know of any such implementation, but it would still adhere to the 03 standard (not the 1x though, where data is guaranteed to be continuous). Also, I can fully imagine a implementation where reserve is "exact" while "insert" reallocation is not. It's always good to state your demands explicitly. – Kornel Kisielewicz Feb 05 '10 at 16:27
A vector is already guaranteed to be contiguous. You must be confusing it with std::string. – UncleBens Feb 05 '10 at 16:32
That reserve doesn't appear to make any difference neither with GCC 4.4 nor VC++ 2005. I think you are also confusing the whole matter with the resizing strategy used by push_back. (VC++ allocates more than requested for *strings* - at least with small values - so as to request blocks whose size is a multiple of two or so, but again reserving before appending makes no difference in the resulting capacity. Appending a string of 100 characters to a string of 100 characters results in a string with the capacity of 207 either way.) – UncleBens Feb 05 '10 at 17:07
also, you should determine which of the vectors is larger than the other and insert the smaller one into the larger one ;) – oz10 Feb 05 '10 at 17:12
secondly, since you are using insert and not copy, the reserve is gratuitous... you don't need it. Under the hood insert is probably going to do the reserve in one chunk and not push_back() like copy() would. – oz10 Feb 05 '10 at 17:14
@ceretullis, now the size comparison point is a strong one, mind if I include that in the answer? – Kornel Kisielewicz Feb 05 '10 at 17:18
What if it matters conceptually what is appended to what? - And another thought about the reserve issue: doesn't this increase the maintenance burden? It basically doesn't add any value, at least with some common implementations. However, I could imagine that eventually you might want to append the contents of a different vector altogether - now you might end up reserving *too little* and *two* memory allocations instead. (Anyway, I'm not religious about this.) – UncleBens Feb 05 '10 at 17:50
2

@ceretullis: comparison is irrelevant here, since the OP requires that `v1` be the target and `v2` be appended to it. Sometimes order matters. Also, if I were to compare, I would rather check the capacity than the size, to see if one is big enough to contain all the data. If you have to extend the capacity of one, you will end up copying the content of both anyway, and the order of the copies does not really matter... – Matthieu M. Feb 05 '10 at 17:58
3

@Kornel: I would probably not use `reserve` and trust `insert`. Also there is a confusion with complexity. `push_back` has amortized constant complexity because of the growing strategy used (usually `2*x+1`) and thus even with multiple reallocation `insert` would have an amortized linear complexity. Note that if you are inserting with anything else than `RandomAccessIterator`, you may be better off not computing the length of the range and just `push_back`... I don't know if they made a special case for `random_access_iterator_tag`. – Matthieu M. Feb 05 '10 at 18:02
@Matthieu, *sigh*, you've convinced me, I edited the answer :> – Kornel Kisielewicz Feb 05 '10 at 18:19
@Matthieu: good point about comparing capacity rather than size. – oz10 Feb 05 '10 at 23:14
@villintehaspam "Multiple reallocations wouldn't fit the bill, unless I am mistaken?" if inserting one thing is always amortized constant then inserting N things one at a time is amortized constant * N = linear, even if there may be multiple re-allocations.... 8 years later – xaxxon Feb 22 '18 at 07:05

Stefan · Answer 2 · 2013-01-15T15:24:33.063

I simply did a quick performance measurement with the following code and

v1.insert( v1.end(), v2.begin(), v2.end() );

seems to be the right choice (as already stated above). Nevertheless, you find the reported performance below.

Test code:

#include <vector>
#include <string>

#include <boost/timer/timer.hpp>

//==============================================================================
// 
//==============================================================================

/// Returns a vector containing the sequence [ 0, ... , n-1 ].
inline std::vector<int> _range(const int n)
{
    std::vector<int> tmp(n);
    for(int i=0; i<n; i++)
        tmp[i] = i;
    return tmp;
}

void test_perf_vector_append()
{
    const vector<int> testdata1 = _range(100000000);
    const vector<int> testdata2 = _range(100000000);

    vector<int> testdata;

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  push_back()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;
        for(size_t i=0; i<testdata2.size(); i++)
        {
            testdata.push_back(testdata2[i]);
        }
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  reserve() + push_back()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;
        testdata.reserve(testdata.size() + testdata2.size());
        for(size_t i=0; i<testdata2.size(); i++)
        {
            testdata.push_back(testdata2[i]);
        }
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  insert()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;

        testdata.insert( testdata.end(), testdata2.begin(), testdata2.end() );
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  reserve() + insert()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;

        testdata.reserve( testdata.size() + testdata.size() ); 
        testdata.insert( testdata.end(), testdata2.begin(), testdata2.end() );
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  copy() + back_inserter()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;

        testdata.reserve(testdata.size() + testdata2.size()); 
        copy(testdata2.begin(), testdata2.end(), back_inserter(testdata));
    }

    printf("--------------------------------------------------------------\n");
    printf(" METHOD:  reserve() + copy() + back_inserter()\n");
    printf("--------------------------------------------------------------\n");
    testdata.clear();
    { vector<int>().swap(testdata); }
    testdata = testdata1;
    {
        boost::timer::auto_cpu_timer t;

        testdata.reserve(testdata.size() + testdata2.size()); 
        copy(testdata2.begin(), testdata2.end(), back_inserter(testdata));
    }

}

With Visual Studio 2008 SP1, x64, Release mode, /O2 /LTCG the output is as follows:

--------------------------------------------------------------
 METHOD:  push_back()
--------------------------------------------------------------
 0.933077s wall, 0.577204s user + 0.343202s system = 0.920406s CPU (98.6%)

--------------------------------------------------------------
 METHOD:  reserve() + push_back()
--------------------------------------------------------------
 0.612753s wall, 0.452403s user + 0.171601s system = 0.624004s CPU (101.8%)

--------------------------------------------------------------
 METHOD:  insert()
--------------------------------------------------------------
 0.424065s wall, 0.280802s user + 0.140401s system = 0.421203s CPU (99.3%)

--------------------------------------------------------------
 METHOD:  reserve() + insert()
--------------------------------------------------------------
 0.637081s wall, 0.421203s user + 0.218401s system = 0.639604s CPU (100.4%)

--------------------------------------------------------------
 METHOD:  copy() + back_inserter()
--------------------------------------------------------------
 0.743658s wall, 0.639604s user + 0.109201s system = 0.748805s CPU (100.7%)

--------------------------------------------------------------
 METHOD:  reserve() + copy() + back_inserter()
--------------------------------------------------------------
 0.748560s wall, 0.624004s user + 0.124801s system = 0.748805s CPU (100.0%)

Why is `insert` faster than `push_back`? – palapapa Nov 27 '22 at 07:38 — palapapa, Nov 27 '22 at 07:38

UncleBens · Answer 3 · 2010-02-05T16:28:43.357

25

Probably better and simpler to use a dedicated method: vector.insert

v1.insert(v1.end(), v2.begin(), v2.end());

As Michael mentions, unless the iterators are input iterators, the vector will figure out the required size and copy appended data at one go with linear complexity.

edited Feb 05 '10 at 16:28

answered Feb 05 '10 at 15:40

UncleBens

40,819
6
57
90

5

AS long as the the iterators are forward, bidirectional, or random access,the final size of the vector will be determined up front and reserved. So there's no need to perform a `reserve()` up front. If the iterators happen to be input iterators, this can't be done, and the vector might have to be reallocated several times dependending on how many items end up getting added. – Michael Burr Feb 05 '10 at 15:46
@Michael, see my answer for the reason for reserve. – Kornel Kisielewicz Feb 05 '10 at 15:53
2

@Kornel Kisielewicz: That is incorrect, reserve might also allocate more memory than needed. – villintehaspam Feb 05 '10 at 15:55
@villintehaspam - **might**, that's why I wrote "strong hint" – Kornel Kisielewicz Feb 05 '10 at 15:58
1

I suppose it all comes down to quality of implementation. I prefer to trust the implementation, because I don't see a technical reason why that **reserve** should make any difference. If your implementation decides to overallocate with plain insert, it is just as likely to overallocate with reserve. – UncleBens Feb 05 '10 at 16:27
@Michael: I don't know if I should be relieved that using forward iterators it will first compute the size... I mean it may require lots and lots of read all over the memory, how are we sure that this is better than just pushing back and (perhaps) reallocating ? – Matthieu M. Feb 05 '10 at 18:04
@Matthieu: Appending is just a special case of inserting. And it is really hard to determine what would be more expensive - iterating a linked list to know its size, or risking with multiple reallocations. And the OP mentioned he's appending a **vector** to another vector. – UncleBens Feb 05 '10 at 18:23
@Matthieu: All I can say is that the standard requires that when using forward iterators "complexity is linear in the number of elements in the range [first, last) plus the distance to the end of the vector". which I take to mean that it can't reallocate (and therefore copy the initial portion) of the array several times depending on the number of items in the iterator range. Therefore, as far as I can tell, it needs to figure out how many items are in the range up front. – Michael Burr Feb 05 '10 at 18:31
@Michael: actually the complexity could well be "amortized linear". After all `push_back` has "amortized constant" complexity because of the growing strategy `2*x+1`. Therefore I am not sure that it actually compute the distance upfront, though it would definitely be a premium with `RandomAccessIterator` as it's a constant time operation. – Matthieu M. Feb 06 '10 at 14:48

score 7 · Answer 4 · answered Feb 05 '10 at 16:50

If you happen to use Boost you can download the development version of the RangeEx library from the Boost Vault. This lib. was accepted into Boost a while ago but so far it hasn't been integrated with the main distribution. In it you'll find a new range-based algorithm which does exactly what you want:

boost::push_back(v1, v2);

Internally it works like the answer given by UncleBens, but the code is more concise and readable.

Viktor Sehr · Answer 5 · 2010-05-17T15:22:43.407

3

If you have a vector of pod-types, and you really need the performance, you could use memcpy, which ought to be faster than vector<>.insert(...):

v2.resize(v1.size() + v2.size());
memcpy((void*)&v1.front(), (void*)&v2[v1.size()], sizeof(v1.front())*v1.size());

Update: Although I would only use this if performance is really, really, needed, the code is safe for pod types.

edited May 17 '10 at 15:22

answered Feb 05 '10 at 15:44

Viktor Sehr

12,825
5
58
90

Altough I'm not 100% sure because resize does a lot of unnessesary contructor calls. – Viktor Sehr Feb 05 '10 at 15:45
@dirkgently: thats why I used resize and not reserve. – Viktor Sehr Feb 05 '10 at 15:46
Yes, this is very fast, but it uses implementation details. – Notinlist Feb 05 '10 at 15:53
5

Probably slower. It first sets the memory to zero, and then overwrites it with the correct values. A modern STL implementation detects PODs automatically, and when copying them will generate an __aligned__ memory copy. Thus, it saves both the zeroing and the alignment check in `memcpy()` – MSalters Feb 05 '10 at 15:56
Can I do this for vector>? (ptime is immutable) – Alex Jenter Feb 05 '10 at 15:57
@MSalters: a modern compiler might also detect that the zeros are never read, and skip the initialization. – Viktor Sehr Feb 05 '10 at 16:01
@Alex Jenter: No you cant, only POD types(look up pod on wikipedia) – Viktor Sehr Feb 05 '10 at 16:02
@Alex: I suspect that this would guarantee messing up the reference count. You'll have shared pointers in both v1 and v2, but those will not know they have been copied. – UncleBens Feb 05 '10 at 16:03
@Viktor, @UncleBens: Thanks, you are absolutely right, come to think of it, I could have figured this out myself. Sorry. I feel stupid now :) – Alex Jenter Feb 05 '10 at 16:06
3

If `memcpy` is really safe, the implementation probably calls it directly anyway. – Potatoswatter Feb 05 '10 at 18:26
*Yikes.* Having to write such code should be a strong hint that you shouldn't be. It should be reformulated to use `std::copy()`, which will do the right thing for all types and probably get optimised to the equivalent of `memcpy()` where supported, but saving you from such ugly syntax. Besides, it would be nice to see a scenario/benchmark where it's still cheaper to default-construct and then reassign; I imagine they exist, since the alternative is to iteratively push elements, which involves checking capacity and incrementing size, but it's not always possible and probably not always faster. – underscore_d Nov 19 '18 at 01:02
@underscore_d std::copy() will use memmove that can be not as effective as memcpy. – Iurii Gordiienko Jul 18 '23 at 21:55
@IuriiGordiienko Such situations should be benchmarked to check whether there is any practical difference, and even if there _is_... one should try to avoid writing code like above. – underscore_d Jul 20 '23 at 08:45

What is the most efficient way to append one std::vector to the end of another?

5 Answers5

Linked