33

For example a C++ vector is implemented using a dynamic array where each element uses consecutive memory spaces.

I know that a C++ multimap is a one to many relationship but what is the internal structure?

4 Answers4

33

The C++ standard does not define how the standard containers should be implemented, it only gives certain constraints like the one you say for vectors.

multimaps have certain runtime complexity (O(lg n) for the interesting operations) and other guarantees, and can be implemented as red-black trees. This is how they are implemented in the GNU standard C++ library.

Magnus Hoff
  • 21,529
  • 9
  • 63
  • 82
  • 1
    And also Dinkumware's STL, and therefore almost every commercial C++ compiler's standard library, including VC. – Simon Buchan Jun 06 '11 at 21:51
  • 1
    What about the elements. Suppose Key A has B,C,D,E. Are these elements searchable or are they clumped together as a node in the Tree –  Jun 06 '11 at 23:13
  • 2
    @Chris: I haven't given it much thought, but I think B, C, D and E could either be clumped on one node or represented as individual nodes. I don't know what you mean by "searchable", though. You can iterate over them, but since they have the same key you can not choose one of them directly (by any property of the value). A multimap only offers lookup by key. – Magnus Hoff Jun 07 '11 at 08:06
  • 2
    @Chris: I recently had a look at GCC's implementation, and found that elements with the same key were in separate nodes. I don't think there's any reason why other implementations might not choose to cluster them, though. – Mike Seymour Jun 07 '11 at 10:34
  • 4
    @MikeSeymour: It's tricky to implement an iterator that dereferences to references of key/value pairs unless each node has it's own key. This also explains lower_bound/upper_bound. – Mooing Duck Dec 17 '11 at 01:41
  • `std::multimap` is semantically equivalent to a `std::set >`. The V values likely have a synthetic key comparator similar to `std::unordered_map` – Spencer Jun 21 '22 at 17:14
9

Very often, a red-black tree. See e.g. STL's Red-Black Trees from Dr. Dobb's.

Xeo
  • 129,499
  • 52
  • 291
  • 397
Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • +1 if you'd emphasised [and explained] "very often" a bit more. There's clearly a misunderstanding to be rectified in this question! – Lightness Races in Orbit Jun 06 '11 at 22:27
  • @Tomalak: Not sure I follow? I hesitate to say "always" (if that's what you're hinting at), because the standard doesn't mandate it, and I don't know of all STL implementations! – Oliver Charlesworth Jun 06 '11 at 22:37
  • No no, exactly. The OP doesn't seem to realise that this is not a standard-mandated thing. I just feel that your answer could make more of that (like Magnus's did). :) (And, ahem, *cough* stdlib!) – Lightness Races in Orbit Jun 06 '11 at 22:47
3

Addition to the "preferred" answer, because SO won't let me comment:

Given a key with values B, C, D, the behavior of iterators is a lot easier to implement if each element has it's own node. Find() is defined to return the first result in the series, and subsequent iteration takes you across the remaining elements. The de facto difference between a map and a multimap is that multimap is sorted using < over the entire value_type, where the map use < over only the key_type

Correction: the C++11 standard is explicit that new (key, mapping) pairs are inserted at the end of any existing values having the same key. This raises a question I hadn't considered: can a multimap contain two nodes in which both the key and the mapped target are the same. The standard doesn't seem to take a clear position on this, but it's noteworthy that no comparison operator is required on the mapped type. If you write a test program, you will find that a multimap can map X to 1, 2, 1. That is: "1" can appear multiple times as a target and the two instances will not be merged. For some algorithms that's a deficiency.

This article from Dr. Dobbs talks about the underlying rb-tree implementation that is commonly used. The main point to note is that the re-balance operation actually doesn't care about the keys at all, which is why you can build an rb-tree that admits duplicated keys.

3

The multimap just like it's simpler version i.e the std::map, is mostly built using red black trees. C++ standard itself does not specify the implementation. But in most of the cases ( I personally checked SGI STL) red black trees are used. Red Black trees are height balanced trees and hence fetch / read operation on them is always guaranteed to be O(log(n)) time. But if you are wondering on how values of the key are stored. each key->pair is saved as a separate node in the red black tree ( Even though the same key might appear multiple times just like in the case of key 'b' below). Key is used to lookup/ search the rb-tree. Once the key is found, it's value stored in the node is returned.

  std::multimap<char,int> mmp;

  mmp.insert(std::pair<char,int>('a',10));
  mmp.insert(std::pair<char,int>('b',20));
  mmp.insert(std::pair<char,int>('b',10));
  mmp.insert(std::pair<char,int>('b',15));
  mmp.insert(std::pair<char,int>('b',20));
  mmp.insert(std::pair<char,int>('c',25));
  mmp.insert(std::pair<char,int>('a',15));
  mmp.insert(std::pair<char,int>('a',7));


for (std::multimap<char,int>::iterator it=mmp.begin(); it!=mmp.end(); ++it){

    std::cout << (*it).first << " => "  << (*it).second << " . Address of (*it).second = " << &((*it).second) << '\n';
}

Output :

a => 10 . Address of (*it).second = 0x96cca24
a => 15 . Address of (*it).second = 0x96ccae4
a => 7 . Address of (*it).second = 0x96ccb04
b => 20 . Address of (*it).second = 0x96cca44
b => 10 . Address of (*it).second = 0x96cca64
b => 15 . Address of (*it).second = 0x96cca84
b => 20 . Address of (*it).second = 0x96ccaa4
c => 25 . Address of (*it).second = 0x96ccac4

Initially I thought the values of a single key like 'b' might be stored in a std::vector .

template <class K, class V>
struct Node {
  K key;
  std::vector<V> values; 
  struct Node* left;
  struct Node* right;
}

But later I realized that would violate the guaranteed fetch time of O(log(n)). Moreover, printing out the addresses of the values confirms that values with a common key are not contiguous.

They keys are inserted using operator<, so values with the same keys are stored in the order in which they are inserted.

So if we insert first (key = 'b', value = 20) and then (key = 'b', value = 10) The insertion is done using operator< , since the second 'b' is NOT lesser than the first inserted 'b', it is inserted in the 'right branch of a binary tree'.

The compiler I have used is gcc-5.1 ( C++14).

avi007
  • 73
  • 1
  • 6
  • Please read the upvoted answers, which are actually *correct*. They also happen to mention (since a good 4 years now!) that libstdc++ (by GNU) and some others actually use a RB-tree. – Deduplicator Nov 18 '15 at 03:09
  • 2
    Even I told the same ! RB tree is red black tree. But the more subtle question asked by OP was how values of the same key is stored in a rb tree, and the differentiation of implementation of map and multimap even though both use RB-Tree. I tried to explain it. Please read the answer more carefully. Thanks. – avi007 Nov 18 '15 at 16:24
  • The point is that a RB-tree is only one of many possibilities. As an aside, your structures cannot be right, even if an RB-tree was used and there was an extra-field for insertion-order, as key-value-pairs are saved in a `std::pair`. And using a timestamp isn't really a good idea, a sequence-number has less overhead and generally works better anyway. – Deduplicator Nov 18 '15 at 18:36
  • The problem with `vector` is not the fetch time (you can't hold that if all keys are equal anyway) but iterator invalidation rules that are not compatible with that of a std::map. It would have to be a list. But then it would have to be a `list::value_type>` so that dereferencing the iterator yields the correct type. Then your multimap iterator needs either a pointer to the node and to the list entry or each list entry needs a pointer to its node to be able to advance beyond the node. All in all not storing each entry in its own node seems quite a hassle. – Brandlingo Dec 01 '22 at 08:08