When does using a std::multimap make sense

Question

I am currently experimenting on some usage of stl-datastructures. However I am still not sure when to use which one and when to use a certain combination. Currently I am trying to figure out, when using a std::multimap does make sense. As far as I can see, one can easily build ones own multimap implementation by combining std::map and std::vector. So I am left with the question when each of these datastructures should be used.

Simplicity: A std::multimap is definitely simpler to use, because one does not have to handle the additional nesting. However access to a range of elements as a bulk one might need to copy the data from the iterators to another datastructure (for example a std::vector).
Speed: The locality of the vector most likely makes iterating over the range of equal element much faster, because the cache usage is optimized. However I am guessing that std::multimaps also have a lot of optimization tricks behind the back to make iterating over equal elements as fast as possible. Also getting to the correct element-range might probably be optimized for std::multimaps.

In order to try out the speed issues I did some simple comparisons using the following program:

#include <stdint.h>
#include <iostream>
#include <map>
#include <vector>
#include <utility>

typedef std::map<uint32_t, std::vector<uint64_t> > my_mumap_t;

const uint32_t num_partitions = 100000;
const size_t num_elements =     500000;

int main() {
  srand( 1337 );
  std::vector<std::pair<uint32_t,uint64_t>> values;
  for( size_t i = 0; i <= num_elements; ++i ) {
    uint32_t key = rand() % num_partitions;
    uint64_t value = rand();
    values.push_back( std::make_pair( key, value ) );
  }
  clock_t start;
  clock_t stop;
  {
    start = clock();
    std::multimap< uint32_t, uint64_t > mumap;
    for( auto iter = values.begin(); iter != values.end(); ++iter ) {
      mumap.insert( *iter );
    }
    stop = clock();
    std::cout << "Filling std::multimap: " << stop - start << " ticks" << std::endl;
    std::vector<uint64_t> sums;
    start = clock();
    for( uint32_t i = 0; i <= num_partitions; ++i ) {
      uint64_t sum = 0;
      auto range = mumap.equal_range( i );
      for( auto iter = range.first; iter != range.second; ++iter ) {
        sum += iter->second;
      }
      sums.push_back( sum );
    }
    stop = clock();
    std::cout << "Reading std::multimap: " << stop - start << " ticks" << std::endl;
  }
  {
    start = clock();
    my_mumap_t mumap;
    for( auto iter = values.begin(); iter != values.end(); ++iter ) {
      mumap[ iter->first ].push_back( iter->second );
    }
    stop = clock();
    std::cout << "Filling my_mumap_t: " << stop - start << " ticks" << std::endl;
    std::vector<uint64_t> sums;
    start = clock();
    for( uint32_t i = 0; i <= num_partitions; ++i ) {
      uint64_t sum = 0;
      auto range = std::make_pair( mumap[i].begin(), mumap[i].end() );
      for( auto iter = range.first; iter != range.second; ++iter ) {
        sum += *iter;
      }
      sums.push_back( sum );
    }
    stop = clock();
    std::cout << "Reading my_mumap_t: " << stop - start << " ticks" << std::endl;
  }
}

As I suspected it depends mainly on the ratio between num_partitions and num_elements, so I am still at a loss here. Here are some example outputs:

For num_partitions = 100000 and num_elements = 1000000

Filling std::multimap: 1440000 ticks
Reading std::multimap: 230000 ticks
Filling    my_mumap_t: 1500000 ticks
Reading    my_mumap_t: 170000 ticks

For num_partitions = 100000 and num_elements = 500000

Filling std::multimap: 580000 ticks
Reading std::multimap: 150000 ticks
Filling    my_mumap_t: 770000 ticks
Reading    my_mumap_t: 140000 ticks

For num_partitions = 100000 and num_elements = 200000

Filling std::multimap: 180000 ticks
Reading std::multimap:  90000 ticks
Filling    my_mumap_t: 290000 ticks
Reading    my_mumap_t: 130000 ticks

For num_partitions = 1000 and num_elements = 1000000

Filling std::multimap: 970000 ticks
Reading std::multimap: 150000 ticks
Filling    my_mumap_t: 710000 ticks
Reading    my_mumap_t:  10000 ticks

I am unsure about how to interpret these results. How would you go about deciding for the correct data structure? Are there any additional constraints for the decission, which I might have missed?

Don't optimize prematurely. Is performance a concern ? If it is, it's very unlikely that the choice of container will be the bottleneck. If it's not, then use what's more understandable and maintainable. — KeatsPeeks, Dec 01 '11 at 13:57
@Samuel_xL: Agreed on premature optimization, but a right choice of container gets you a long way. — , Dec 01 '11 at 14:01
@Samuel_xL: Currently in the application the more complicated solution of nested data structures is used. I am mainly trying to figure out if there might be a specific reason for this and I should not touch it, or if I can just replace it with something simpler. If I were to choose freely, I would go for the `std::multimap` solution as well. — LiKao, Dec 01 '11 at 14:02
If inserting only once and reading multiple times, you could also consider using a sorted vector. This eliminates the 'nesting' while maintaining maximal cache coherency. — fjardon, Dec 01 '11 at 14:03
I think that a `multimap` is better described as a `map` of `list`s rather than a `map` of `vector`s. — CAFxX, Dec 01 '11 at 15:06
`std::make_pair( mumap[i].begin(), mumap[i].end() );` is inefficient (two O(log N) lookups), it should be `std::vector& vec = mumap[i]; auto range = std::make_pair(vec.begin(), vec.end());` Unfortunately in C++ the compiler cannot easily proceed with the Common Subexpression Elimination optimization. This should improve reading and thus the benchmark in which `my_mumap_t` performed more poorly than `multimap` for `Reading`. You should be expected faster reads in general from the use of `vector`. — Matthieu M., Dec 01 '11 at 17:04
you should cache your end iterator here: ```for( auto iter = values.begin(), end = values.end(); iter != end; ++iter )``` Because iterator are expensive to instantiate, and if you keep querying ```values.end()``` at every loop you are pretty much allocating and deallocating your end iterator instance every single time. — mchiasson, Jan 18 '15 at 01:11

Kerrek SB · Accepted Answer · 2017-09-14T15:08:14.090

It's hard to tell whether your benchmark is doing the right thing, so I can't comment on the numbers. However, a few general points:

Why multimap rather than map of vectors: Maps, multimaps, sets and multisets are all essentially the same data structure, and once you have one, it's trivial to just spell out all four. So the first answer is, "why not have it"?
How is it useful: Multimaps are one of those things that you need rarely, but when you need them, you really need them.
Why not roll my own solution? As I said, I'm not sure about those benchmarks, but even if you could make something else that isn't worse than the standard container (which I question), then you should consider the overall burden of getting it right, testing it and maintaining it. Imagine a world in which you would be taxed for every line of code you wrote (that's Stepanov's suggestion). Re-use industry-standard components whenever possible.

Finally, here's the typical way you iterate a multimap:

for (auto it1 = m.cbegin(), it2 = it1, end = m.cend(); it1 != end; it1 = it2)
{
  // unique key values at this level
  for ( ; it2 != end && it2->first == it1->first; ++it2)
  {
    // equal key value (`== it1->first`) at this level
  }
}

score 8 · Answer 2 · answered Dec 01 '11 at 14:57

8

A map of vectors comes with the memory overhead for the capacity of each vector. std::vector typically allocates space for more elements than you actually have. It may not be a big deal for your application, but it's another tradeoff you haven't considered.

If you're doing a lot of reads, then the O(1) lookup time of unordered_multimap might be a better choice.

If you have a reasonably modern compiler (and given the presence of the auto keyword, you do) then in general you're going to have a difficult time beating the standard containers in terms of performance and reliability. The people who wrote them are experts. I would always start with the standard container that most easily expresses what you want to do. Profile your code early and often, and if it's not running fast enough, then look for ways to improve it (e.g., using the unordered_ containers when doing mostly reads).

So, to answer your original question, if you need an associative array of values where those values won't be unique, then using std::multimap definitely makes sense.

answered Dec 01 '11 at 14:57

Michael Kristofik

34,290
15
75
125

1

Yes, using `unordered_` type containers would most likely be the biggest possible improvement, because we hardly ever use the ordering we get by the ordered containers. However I am not sure if these containers are available with the STL version we are using, yet. – LiKao Dec 01 '11 at 15:31
@LiKao, you might have them in the `std::tr1` namespace if your compiler supplies it. – Michael Kristofik Dec 01 '11 at 15:49
talking about the memory overhead when using a std::vector deserves +1 – Soumen Nov 12 '13 at 02:59
If we're talking about memory overhead, doesn't `std::multimap` keep N copies of each key having N values? I think this could become a nuisance with larger keys (e.g. strings) in `std::multimap`. – Michał Górny Mar 05 '17 at 15:20
The overhead of vector is likely to be not worse than the overhead of having many more items in a map; unless the elements are rather big at least. – Francesco Dondi May 17 '17 at 12:41

score 8 · Answer 3 · answered Dec 01 '11 at 17:22

You have forgotten one very important alternative: not all sequences are created equal.

Especially, why a vector and not a deque or a list ?

Using list

A std::map<int, std::list<int> > should perform roughly equivalently to a std::multimap<int, int> since list is node based as well.

Using deque

A deque is the default container to use when you don't know for which to go and do not have any special requirement.

With regard to the vector, you trade up some read speed (not much) for faster push and pop operations.

Using a deque instead, and some obvious optimizations, I get:

const uint32_t num_partitions = 100000;
const size_t num_elements =     500000;

Filling std::multimap: 360000 ticks
Filling MyMumap:       530000 ticks

Reading std::multimap: 70000 ticks (0)
Reading MyMumap:       30000 ticks (0)

Or in the "bad" case:

const uint32_t num_partitions = 100000;
const size_t num_elements =     200000;

Filling std::multimap: 100000 ticks
Filling MyMumap:       240000 ticks

Reading std::multimap: 30000 ticks (0)
Reading MyMumap:       10000 ticks (0)

Thus reading is unconditionally faster, but filling is also way slower.

When does using a std::multimap make sense

3 Answers3

Linked