How can I use merge to merge two maps/multimaps (C++11 STL)

Question

My project deals with divide-n-conquer strategy to parralelize an existing algorithm. The algoritm returns a std::multimap<int, std::pair<int, int>>. Above is a sketch of a premilinary (single_thread) version.

typedef std::multimap<int, std::pair<int, int>> Map ;
struct Box
{
  std::pair<int, int> top_left, bottom_right ;
}
Map my_algo(Box, a_box)
{
  Map final_m ;
  if (not is_small(a_box))
  {
    box b1, b2 ;
    split_box(a_box, &b1, &b2) ; // this function divides the box in two halves
    Map m1 = my_algo(b1) ;
    Map m2 = my_algo(b2) ;

    // this line will not compile. final_m.begin() is not accepted.
    std::merge (m1.begin(), m1.end(), m2.begin(), m2.end(), final_m.begin()) ; 
  }
 return final_m ;
}

I know I could use insert instead or merge to do the merging (as explained here). But insert is in O(N.Log(n)) while merge is in O((N)). Because of the number of merging operations involved in the algorithm, time-complexity matters.

Thanks for helping, Olivier

EDIT:

Thanks to jogojapan (see answer below), this is a working demo of the corrected code:

#include <iostream>
#include <map>
#include <iterator>
#include <algorithm>

typedef std::pair<int, int> Cell ;
typedef std::multimap<int, Cell> Border_map ;

void test_merge_maps_1()
{
Border_map a, b, c ;
std::cout << std::endl << "a" << std::endl ;
for (int i=1; i<10; i+=2)
{
    a.insert(std::pair<int, Cell>(i, Cell(i,i))) ;
    std::cout << i << " " ;
}

std::cout << std::endl << "b" << std::endl ;
for (int i=2; i<11; i+=2)
{
    b.insert(std::pair<int, Cell>(i, Cell(i,i))) ;
    std::cout << i << " " ;
}

std::cout << std::endl << "merge" << std::endl ;
std::merge(a.begin(), a.end(), b.begin(), b.end(), inserter(c,end(c))) ;

std::cout << "result" << std::endl ;
for(auto x: c)
    std::cout << x.first << " " ;
std::cout << std::endl ;
}

int main(void)
{
    test_merge_maps_1() ;
    return 0 ;
}

If this is the end of your algorithm you should consider using a `vector` for the final result. `merge` will generate an ordered output sequence and all kinds of queries you end up doing on a `multimapS` are faster on ordered `vectorS`. Because you know the size of the resulting sequence you can also pre-allocate with `vector::reserve`. For higher genericity, just accept an `OutputIterator` as an argument and use this. Maybe document the space complexity of the algorithm so the user can pre-allocate the memory. — pmr, Nov 05 '12 at 09:48
@pmr I beleive I need a multimap because the contents is a cell in a grid (aka. std::pair) and the key is a value (aka. double). When the container is processed, I need to pull out the cells by keys (the lowest the first). Because merging ordered sets (here multpmaps) is in O(N) while sorting is in N.log(N), I prefer maintaining a sorted multimap form bottom to top rather than packing the data at random and sorting them when requested. — quickbug, Nov 05 '12 at 09:56
@user1770724: an "ordered vector" isn't "packing the data at random and sorting when requested". `merge` already outputs the results in order, there would be no need for a separate sort operation. pmr's point is just that lookups are typically faster on an ordered vector (using binary search) than they are on a multimap. So, provided you don't need to make random insertions the vector beats the multimap. — Steve Jessop, Nov 05 '12 at 10:25

jogojapan · Answer 1 · 2012-11-05T09:58:23.710

5

Yes, multimap<T>::begin() returns an ordinary (birectional) iterator, but you need an iterator capable of making insertions. You can obtain one using the std::inserter template from the iterator header:

#include <iterator>

/* ... */

std::merge(begin(m1),end(m1),begin(m2),end(m2),inserter(final_m,end(final_m)));

As you can see, std::inserter takes two arguments: The container you need an insertion iterator for (i.e. final_m), and an ordinary iterator for the same container, which is used as starting point for the insertions. Due to the nature of the merge operation, the starting point given for insertions should be the end of the multi-map being created. Therefore, I used end(final_m) (which is the same as final_m.end()) as the second argument.

edited Nov 05 '12 at 09:58

answered Nov 05 '12 at 09:25

jogojapan

68,383
11
101
131

1

Nitpick: `multimap` is a Sequence container and its iterators must be at least `ForwardIteratorS` not `InputIteratorS`. – pmr Nov 05 '12 at 09:33
If you use a random iterator, you might as well just use the insert solution rejected in the question. Using end is probably the right thing to do, however, because merge will insert all elements at the end, so using end as the hint should give the performance benefit the questioner is looking for. – ymett Nov 05 '12 at 09:34
@pmr True! Or actually, I suppose `begin()` of `std::multimap` gives you what the standard calls a bidirectional iterator. – jogojapan Nov 05 '12 at 09:45
2

`end(final_m)` *is* the right iterator, because as ymett says it's the point at which the element really does need to be inserted. In the case where the supplied iterator is the correct insertion point, `insert` is guaranteed amortized `O(1)` (Table 102 -- Associative Container Requirements), which is exactly what the questioner asked for. – Steve Jessop Nov 05 '12 at 09:50
@SteveJessop Yes, that's true. I've made this clearer in the post. (My original reasoning was that the hint given to the algorithm is only relevant for the very first insertion made by the merge algorithm, and therefore does not have an effect on the overall algorithmic complexity. But in any case, I agree that using `end()` makes sense.) – jogojapan Nov 05 '12 at 10:00
In the problem I've posted, we do not know if the two maps to merge cover intersecting intervals or not. If the intervals intersect (which actually may be the general case), I see no difference between begin() and end(). Is this correct? – quickbug Nov 05 '12 at 12:25
The idea behind using `end()` is that the merge algorithm works from "left to right", i.e. iterating from small elements to large elements, in both input maps (m1 and m2) in parallel. It inserts elements one after the other in the output map. Each element that it inserts into the output map must be equal or larger than the element inserted before. Hence, starting at `m_final.end()` when determining the insertion point is optimal -- in fact the element will have to be inserted right at the end of `m_final` anyway, as it is larger than all elements inserted before. – jogojapan Nov 05 '12 at 15:52
So `end()` is optimal, regardless of whether `m1` and `m2` overlap. – jogojapan Nov 05 '12 at 15:54
I get a "sequence not ordered" error when I merge the two multimaps. How can I deal with it ? Thanks. – locke14 Feb 03 '16 at 09:42

score 1 · Answer 2 · answered Apr 11 '18 at 11:58

C++ insert

#include <map>
#include <string>

using std::map;
using std::string;

map< string, string > lhs;
map< string, string > rhs;

lhs.insert( rhs.begin( ), rhs.end( ) );

C++17 merge

#include <map>
#include <string>

using std::map;
using std::string;

map< string, string > lhs;
map< string, string > rhs;

lhs.merge( rhs );

How can I use merge to merge two maps/multimaps (C++11 STL)

2 Answers2