248

I'm using multitreading and want to merge the results. For example:

std::vector<int> A;
std::vector<int> B;
std::vector<int> AB;

I want AB to have to contents of A and the contents of B in that order. What's the most efficient way of doing something like this?

royhowie
  • 11,075
  • 14
  • 50
  • 67
jmasterx
  • 52,639
  • 96
  • 311
  • 557
  • 1
    If looking for efficiency when you work with large size containers, it might be more efficient to use list, where you can splice one to the other with several pointer operations. But list has space overhead (consider using single linked list). – Kemin Zhou Mar 10 '19 at 19:17
  • 1
    Does this answer your question? [Concatenating two std::vectors](https://stackoverflow.com/questions/201718/concatenating-two-stdvectors) – Henke Jan 01 '21 at 15:14

10 Answers10

400
AB.reserve( A.size() + B.size() ); // preallocate memory
AB.insert( AB.end(), A.begin(), A.end() );
AB.insert( AB.end(), B.begin(), B.end() );
Kirill V. Lyadvinsky
  • 97,037
  • 24
  • 136
  • 212
  • what is the time complexity for such an operation? Is it constant time or O(n)? – Shehbaz Jaffer Mar 06 '13 at 07:33
  • 13
    it should copy each element, so it's O(n) – Kirill V. Lyadvinsky Mar 11 '13 at 17:40
  • But amortized constant complexity when properly implemented. [http://en.cppreference.com/w/cpp/container/vector/push_back](http://en.cppreference.com/w/cpp/container/vector/push_back) – boycy Apr 11 '15 at 00:14
  • 2
    Not sure whether to ask a new question or not, but can this answer be improved when taking move semantics into account? Is there some way I can expect/instruct the compiler to do a single memory move instead of looping over all elements? – Broes De Cat Apr 26 '15 at 21:33
  • 2
    @boycy No. It is amortized constant time to push_back one element. To push back n elements is O(n) – Konrad Lindenbach Mar 12 '16 at 01:47
  • 1
    @Konrad I didn't imply otherwise, but thanks for the clarification. Note the complexity of an insert operation is never given in terms of the number of elements being inserted - which will always give O(n) - but in terms of the number of elements *already in* the container, since that provides a measure of its scalability. – boycy Mar 14 '16 at 10:12
  • @KirillV.Lyadvinsky Why isn't doing AB.insert() after declaring AB good enough. Why do we need to preallocate memory? – Prakhar Agrawal Jun 28 '16 at 08:42
  • 1
    @Prakhar Agrawal AB.insert() will insert new elements one by one and this can lead to ineffective memory allocation (sequence of memory reallocations). – Kirill V. Lyadvinsky Jul 01 '16 at 06:55
  • 1
    @KirillV.Lyadvinsky: That would imply that the vector implementation doesn't make use of the fact that the iterators are random access (and thus forward) iterators. Any decent vector implementation will reallocate only once (for every insert operation on forward iterators), unless the given iterator is an input iterator. – Pixelchemist Nov 13 '16 at 10:39
  • @Pixelchemist: Right but there are two `insert` operations here. – Lightness Races in Orbit Nov 02 '17 at 22:10
  • @LightnessRacesinOrbit: Yes - as I said via "once (for every insert operation [...])". I don't get your point. I just think that the comment about -*insert() inserting elements one by one leading to a sequence of memory reallocations*- is somewhat misleading (at least in the common as well as the OP case). This is why i commented. I didn't imply that using reserve is the wrong thing to do. It is absolutely justified in order to avoid one reallocation. – Pixelchemist Nov 03 '17 at 01:28
  • @Pixelchemist: Right, and I didn't say that you implied that. I'm just observing that, in this particular example, the `reserve` does have merit. So you're both right to some degree. – Lightness Races in Orbit Nov 03 '17 at 11:47
  • @BroesDeCat You could use `std::move_iterator` (see [here](https://en.cppreference.com/w/cpp/iterator/move_iterator)) which is an iterator adapter that returns an rvalue reference when dereferencing it. That enables (but not guarantees) that the source elements are moved. – Brandlingo Nov 27 '19 at 14:51
94

This is precisely what the member function std::vector::insert is for

std::vector<int> AB = A;
AB.insert(AB.end(), B.begin(), B.end());
Robbie Morrison
  • 159
  • 2
  • 4
Shirik
  • 3,631
  • 1
  • 23
  • 27
  • Except `std::vector::insert` is godawful slow when all you're trying to do is concatenate two vectors. This is where `` is fail and you really wish `std::vector::extend` was a real method. – Nick Bastin Jul 05 '10 at 04:50
  • 4
    @Nick: Slow compared to what? – GManNickG Jul 05 '10 at 05:14
  • 2
    Maybe that it checks for enough space on each insert of element? Using reserve beforehand will speed it up. – RvdK Jul 05 '10 at 09:04
  • @GMan: It's slow compared to knowing how to preallocate the final size. – Nick Bastin Jul 05 '10 at 18:15
  • 13
    @Nick: I wouldn't be surprised if every modern stdlib implementation specialized `insert` on random-access iterators and reserved up-front. – GManNickG Jul 05 '10 at 19:28
  • 1
    @Gman: That's a fair point since we know that the source is also a vector (where iterator `distance` has O(1) complexity). Still, the performance guarantees of `insert` are something to be mindful of when you can often do better by planning ahead. – Nick Bastin Jul 05 '10 at 20:38
  • 2
    @RvdK checking for space is only a few instructions: load capacity, compare to size, conditional jump; all of which is negligible cost for most cases. Since `size < capacity` most of the time, branch prediction will likely cause the non-reallocating branch's instructions to be in the instruction pipeline, minimising branch-induced latency except for low iteration count. This assumes a good vector implementation, plus CPU instruction pipeline & [good] branch prediction, but those are pretty reliable assumptions for a modern toolchain and desktop machine. Don't know about smartphones though.. – boycy Feb 24 '15 at 14:50
  • If done repeatedly then NOT using `reserve(...)` will be faster. – ABaumstumpf Aug 23 '23 at 18:47
28

Depends on whether you really need to physically concatenate the two vectors or you want to give the appearance of concatenation of the sake of iteration. The boost::join function

http://www.boost.org/doc/libs/1_43_0/libs/range/doc/html/range/reference/utilities/join.html

will give you this.

std::vector<int> v0;
v0.push_back(1);
v0.push_back(2);
v0.push_back(3);

std::vector<int> v1;
v1.push_back(4);
v1.push_back(5);
v1.push_back(6);
...

BOOST_FOREACH(const int & i, boost::join(v0, v1)){
    cout << i << endl;
}

should give you

1
2
3
4
5
6

Note boost::join does not copy the two vectors into a new container but generates a pair of iterators (range) that cover the span of both containers. There will be some performance overhead but maybe less that copying all the data to a new container first.

bradgonesurfing
  • 30,949
  • 17
  • 114
  • 217
  • 1
    Nice idea. After thinking for a while I realized this goal may also be accomplished without using boost libraries. I've posted an answer explaining how. – Ronald Souza Sep 15 '19 at 14:56
20

EDIT: C++23 provides the append_range method for vectors:

AB.append_range(A);
AB.append_range(B);

and that's it!

Time complexity is O(|A| + |B|). For those cases where "to look like concatenated" suffices, an O(1) (constant time) alternative is described below.


In the direction of Bradgonesurfing's answer, many times one doesn't really need to concatenate two vectors (O(n)), but instead just work with them as if they were concatenated (O(1)). If this is your case, it can be done without Boost libraries.

The trick is to create a vector proxy: a wrapper class which manipulates references to both vectors, externally seen as a single, contiguous one.

USAGE

std::vector<int> A{ 1, 2, 3, 4, 5};
std::vector<int> B{ 10, 20, 30 };

vproxy<int> AB(A, B);  // ----> O(1). No copies performed.

for (size_t i = 0; i < AB.size(); ++i)
    std::cout << AB[i] << " ";  // 1 2 3 4 5 10 20 30

IMPLEMENTATION

template <class T>
class vproxy {
private:
    std::vector<T>& v1, v2;
public:
    vproxy(std::vector<T>& ref1, std::vector<T>& ref2) : v1(ref1), v2(ref2) {}
    const T& operator[](const size_t& i) const;
    const size_t size() const;
};

template <class T>
const T& vproxy<T>::operator[](const size_t& i) const{
    return (i < v1.size()) ? v1[i] : v2[i - v1.size()];
};

template <class T>
const size_t vproxy<T>::size() const { return v1.size() + v2.size(); };

MAIN BENEFIT

It's O(1) (constant time) to create it, and with minimal extra memory allocation.

SOME STUFF TO CONSIDER

  • You should only go for it if you really know what you're doing when dealing with references. This solution is intended for the specific purpose of the question made, for which it works pretty well. To employ it in any other context may lead to unexpected behavior if you are not sure on how references work.
  • In this example, AB does not provide a non-const access operator ([ ]). Feel free to include it, but keep in mind: since AB contains references, to assign it values will also affect the original elements within A and/or B. Whether or not this is a desirable feature, it's an application-specific question one should carefully consider.
  • Any changes directly made to either A or B (like assigning values, sorting, etc.) will also "modify" AB. This is not necessarily bad (actually, it can be very handy: AB does never need to be explicitly updated to keep itself synchronized to both A and B), but it's certainly a behavior one must be aware of. Important exception: to resize A and/or B to sth bigger may lead these to be reallocated in memory (for the need of contiguous space), and this would in turn invalidate AB.
  • Because every access to an element is preceded by a test (namely, "i < v1.size()"), vproxy access time, although constant, is also a bit slower than that of vectors.
  • This approach can be generalized to n vectors. I haven't tried, but it shouldn't be a big deal.
Ronald Souza
  • 627
  • 1
  • 8
  • 14
17

Based on Kiril V. Lyadvinsky answer, I made a new version. This snippet use template and overloading. With it, you can write vector3 = vector1 + vector2 and vector4 += vector3. Hope it can help.

template <typename T>
std::vector<T> operator+(const std::vector<T> &A, const std::vector<T> &B)
{
    std::vector<T> AB;
    AB.reserve(A.size() + B.size());                // preallocate memory
    AB.insert(AB.end(), A.begin(), A.end());        // add A;
    AB.insert(AB.end(), B.begin(), B.end());        // add B;
    return AB;
}

template <typename T>
std::vector<T> &operator+=(std::vector<T> &A, const std::vector<T> &B)
{
    A.reserve(A.size() + B.size());                // preallocate memory without erase original data
    A.insert(A.end(), B.begin(), B.end());         // add B;
    return A;                                        // here A could be named AB
}
aloisdg
  • 22,270
  • 6
  • 85
  • 105
  • 1
    Do you mean to add the elements of each vector to each other? Or you mean to append? This is clear now but for the next 5 years..? You shouldn't overload operator if meaning is ambiguous. – S.R Aug 09 '17 at 15:31
  • 2
    @S.R I mean to concate. I wrote this answer 3 years ago. I still know what It means. No problem there. If C++ could provide its own overload it will be even better. (and yes `::` is taken ;) – aloisdg Aug 10 '17 at 17:36
  • Definitely not clear in general that `v1 + v2` doesn't represent addition. – Apollys supports Monica Oct 09 '18 at 21:02
  • @Apollys [well](https://tio.run/##K6gsycjPM/7/P1HBViHaUMcolisJxDLWMYnl4iooyswr0UhU0FZI0vz/HwA "Python 3 – Try It Online") – aloisdg Oct 09 '18 at 21:06
  • Alternative would be to use `@` like in F# – aloisdg Oct 09 '18 at 21:08
  • When appending a vector (**`operator+=`**) it is preferable not to call `reserve` before `insert`. See [these notes on `reserve`](https://en.cppreference.com/w/cpp/container/vector/reserve#Notes). It makes sense for the other operator (`operator+`) though. – user5534993 Jun 02 '20 at 20:35
  • @user5534993 "When inserting a range, the range version of insert() is generally preferable as it preserves the correct capacity growth behavior, unlike reserve() followed by a series of push_back()s." So do I need reserve or can I ignore it and let the STL manage the allocation? – aloisdg Jun 03 '20 at 08:58
  • @aloisdgmovingtocodidact.com At first glance, I would say that the [hint](https://en.cppreference.com/w/cpp/container/vector/reserve#Notes) says that [`insert()`](https://en.cppreference.com/w/cpp/container/vector/insert) is preferable to `reserve()`. So that you should do without `reserve()`. However, the note compares this to the specific example with [`push_back()`](https://en.cppreference.com/w/cpp/container/vector/push_back)s, which you do not use. My first intention is to say that `reserve()` can be omitted. However, I would not stake my life on it. Would be worth a separate question. – user5534993 Jun 05 '20 at 08:57
  • @user5534993 Indeed. Feel free to link the new question here :) – aloisdg Jun 05 '20 at 09:18
  • 1
    >> is a good candidate for "append" operator, as that is what bash does for files. – Chris Reid Aug 21 '20 at 22:33
3

One more simple variant which was not yet mentioned:

copy(A.begin(),A.end(),std::back_inserter(AB));
copy(B.begin(),B.end(),std::back_inserter(AB));

And using merge algorithm:


#include <algorithm>
#include <vector>
#include <iterator>
#include <iostream>
#include <sstream>
#include <string>

template<template<typename, typename...> class Container, class T>
std::string toString(const Container<T>& v)
{
    std::stringstream ss;
    std::copy(v.begin(), v.end(), std::ostream_iterator<T>(ss, ""));
    return ss.str();
};


int main()
{
    std::vector<int> A(10);
    std::vector<int> B(5);  //zero filled
    std::vector<int> AB(15);

    std::for_each(A.begin(), A.end(),
            [](int& f)->void
            {
                f = rand() % 100;
            });

    std::cout << "before merge: " << toString(A) << "\n";
    std::cout << "before merge: " << toString(B) << "\n";
    merge(B.begin(),B.end(), begin(A), end(A), AB.begin(), [](int&,int&)->bool {});
    std::cout << "after merge:  " << toString(AB) << "\n";

    return 1;
}

Erik Campobadal
  • 867
  • 9
  • 14
D. Alex
  • 167
  • 1
  • 1
  • 6
0

All the solutions are correct, but I found it easier just write a function to implement this. like this:

template <class T1, class T2>
void ContainerInsert(T1 t1, T2 t2)
{
    t1->insert(t1->end(), t2->begin(), t2->end());
}

That way you can avoid the temporary placement like this:

ContainerInsert(vec, GetSomeVector());
user3360767
  • 868
  • 7
  • 18
0

For this use case, if you know beforehand the number of results each thread produces, you could preallocate AB and pass a std::span to each thread. This way the concatenation need not be done. Example:

std::vector<int> AB(total_number_of_results, 0);
std::size_t chunk_length = …;
std::size_t chunk2_start = chunk_length;
std::size_t chunk3_start = 2 * chunk_length; // If needed
…
// Pass these to the worker threads.
std::span<int> A(AB.data(), chunk_length);
std::span<int> B(AB.data() + chunk2_start, chunk_length);
…
tsnorri
  • 1,966
  • 5
  • 21
  • 29
0

My answer is based on Mr.Ronald Souza's original solution. In addition to his original solution, I've written a vector proxy that supports iterators too!

short description for people who are not aware of the context of the original solution: the joined_vector template class (i.e the vector proxy)takes two references of two vectors as constructor arguments, it then treats them as one contiguous vector. My implementation also supports a forward-iterator.

USAGE:

int main()
{
    std::vector<int> a1;
    std::vector<int> a2;

    joined_vector<std::vector<int>> jv(a1,a2);

    for (int i = 0; i < 5; i++)
        a1.push_back(i);
    for (int i = 5; i <=10; i++)
        a2.push_back(i);

    for (auto e : jv)
        std::cout << e<<"\n";
    for (int i = 0; i < jv.size(); i++)
        std::cout << jv[i] << "\n";
    return 0;
}

IMPLEMENTATION:

template<typename _vec>
class joined_vector
{
    _vec& m_vec1;
    _vec& m_vec2;

public:

    struct Iterator
    {
        typedef typename _vec::iterator::value_type type_value;
        typedef typename _vec::iterator::value_type* pointer;
        typedef typename _vec::iterator::value_type& reference;
        typedef std::forward_iterator_tag iterator_category;
        typedef std::ptrdiff_t difference_type;
        _vec* m_vec1;
        _vec* m_vec2;
        Iterator(pointer ptr) :m_ptr(ptr)
        {

        }
        Iterator operator++()
        {
            if (m_vec1->size() > 0 && m_ptr == &(*m_vec1)[m_vec1->size() - 1] && m_vec2->size() != 0)
                m_ptr = &(*m_vec2)[0];
            else
                ++m_ptr;
            return m_ptr;
        }
        Iterator operator++(int)
        {
            pointer curr = m_ptr;
            if (m_vec1->size() > 0 && m_ptr == &(*m_vec1)[m_vec1->size() - 1] && m_vec2->size() != 0)
                m_ptr = &(*m_vec2)[0];
            else
                ++m_ptr;
            return curr;
        }
        reference operator *()
        {
            return *m_ptr;
        }
        pointer operator ->()
        {
            return m_ptr;
        }

        friend bool operator == (Iterator& itr1, Iterator& itr2)
        {
            return itr1.m_ptr == itr2.m_ptr;
        }
        friend bool operator != (Iterator& itr1, Iterator& itr2)
        {
            return itr1.m_ptr != itr2.m_ptr;
        }
    private:
        pointer m_ptr;
    };

    joined_vector(_vec& vec1, _vec& vec2) :m_vec1(vec1), m_vec2(vec2)
    {

    }
    Iterator begin()
    {
        //checkes if m_vec1 is empty and gets the first elemet's address,
        //if it's empty then it get's the first address of the second vector m_vec2
        //if both of them are empty then nullptr is returned as the first pointer
        Iterator itr_beg((m_vec1.size() != 0) ? &m_vec1[0] : ((m_vec2.size() != 0) ? &m_vec2[0] : nullptr));
        itr_beg.m_vec1 = &m_vec1;
        itr_beg.m_vec2 = &m_vec2;
        return itr_beg;
    }
    Iterator end()
    {
        //check if m_vec2 is empty and get the last address of that vector
        //if the second vector is empty then the m_vec1's vector/the first vector's last element's address is taken
        //if both of them are empty then a null pointer is returned as the end pointer
        typename _vec::value_type* p = ((m_vec2.size() != 0) ? &m_vec2[m_vec2.size() - 1] : ((m_vec1.size()) != 0 ? &m_vec1[m_vec1.size() - 1] : nullptr));
        Iterator itr_beg(p != nullptr ? p + 1 : nullptr);
        itr_beg.m_vec1 = &m_vec1;
        itr_beg.m_vec2 = &m_vec2;
        return itr_beg;
    }
    typename _vec::value_type& operator [](int i)
    {
        if (i < m_vec1.size())
            return m_vec1[i];
        else
            return m_vec2[i - m_vec1.size()];
    }
    size_t size()
    {
        return m_vec1.size() + m_vec2.size();
    }

};
Sudharsan
  • 21
  • 1
  • 3
-1

If your vectors are sorted*, check out set_union from <algorithm>.

set_union(A.begin(), A.end(), B.begin(), B.end(), AB.begin());

There's a more thorough example in the link.

1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
Cogwheel
  • 22,781
  • 4
  • 49
  • 67
  • 6
    also, it does not do the same thing as a straight append - the elements in the output range are unique, which may not be what the OP wanted (they might not even be comparable). It's certainly not the most efficient way of doing it. – Peter Jul 05 '10 at 05:13