4

Why are c++ standard maps very famous for, whereas an array of pairs is also similar those?

When would it be better to use c++ standard maps, and when to use an array of pairs? Or are the applications similar for both?

JeJo
  • 30,635
  • 6
  • 49
  • 88
Harini Sj
  • 173
  • 1
  • 2
  • 11
  • 2
    Try inserting an item near the beginning of a one million element vector, where each element has an expensive copy operation. – PaulMcKenzie Jul 10 '20 at 19:37
  • 1
    ... Or searching for an item near the end ... – Paul Sanders Jul 10 '20 at 19:37
  • 1
    you can reach array elements by indexing using numbers only (you will have to memorize all of your elements positions) while the map elements can be reached in O(lg(n)) using anything – asmmo Jul 10 '20 at 19:40
  • 1
    An array of pairs can have duplicates. A `std::map` cannot. There's one difference for you (there are others). – Jesper Juhl Jul 10 '20 at 19:41
  • 3
    [vector or map, which one to use?](https://stackoverflow.com/questions/454762/vector-or-map-which-one-to-use) – R Sahu Jul 10 '20 at 19:44
  • 1
    How many items? For <10, vector might be faster. For 1000, map for sure. :) – Macke Jul 10 '20 at 20:27

4 Answers4

7

Why are c++ standard maps very famous for, whereas an array of pairs is also similar to those?

The following table should give you a small idea, where to use c++ standard maps (only choose some) rather an array of pairs.

table


When would it be better to use c++ standard maps, and when to use an array of pairs?

You can always benchmark the efficiency of your data-structure, to see which one is apt for which situations.

Let's look into an example.

For instance, the followings are the bench-mark done in https://quick-bench.com/, for the insertion of 10000 elements in the beginning of each

  1. array of pairs (i.e. std::vector<std::pair<int, std::string>>) vs
  2. standard map (i.e. std::map<int, std::string>) vs
  3. standard hash-map (i.e. std::unordered_map<int, std::string>).

It turned out that using the C++ standard maps is faster/efficient than array of pairs for this kind of operations.

(See online bench-mark)

banchmark1

However, for the case of fewer entries (let's consider 10 elements), the same test shows that the usage of array of pairs (i.e. std::vector<std::pair<int, std::string>>) would be faster/efficient than standard hash-map (i.e. std::unordered_map<int, std::string>), and alomost equalent to standard map (i.e. std::map<int, std::string>).

(See online bench-mark)

banchmark1


In short, the thumb rule is, always bench-mark and test for the operations what you will be having, before choosing the right data structure for best results. For further references have a look std::vector, std::map, std::unordered_map, etc and their individual operations.

JeJo
  • 30,635
  • 6
  • 49
  • 88
  • Note that Big-O notation doesn't tell the whole story. Cache locality can completely invert the time taken for each operation, for instance. Big-O is a good place to start but is not a good place to stop. – metal Jul 10 '20 at 20:02
  • I'd also note that `std::unordered_map` (if not a flat map) is often the preferred starting point over `std::map`, and these have very different Big-O values. – metal Jul 10 '20 at 20:30
3

You use a Map when you are trying to do just that, map one item to another. An array of pairs is just storing a bunch of pairs and it provides no map functionality since you are still constrained by the index of the array.

A map is also handy for counting frequencies. Since a map (the standard map at least, there are variations of maps that allow duplicates) will not duplicate key values. You can use that to find the frequency of an element etc.

The main reason to use a map is when you want to find a value based on a key that you get to define. Its not so much about just storing items together as in a pair.

Grant Singleton
  • 1,611
  • 6
  • 17
2

If you have a problem that lends itself to an associative mapping, you can use a tool like std::unordered_map or std::map (or Abseil's Swiss Tables or Boost's flat_map or similar) to make your job easier. Yes, you can always manage pairs in a array or std::vector by yourself, and sometimes that is the right thing to do.

Reasons to use the standard containers include readability and maintainability of the code (most C++ programmers are familiar with the standard containers and algorithms to some degree), correctness (the standard containers have a long history and are far less likely to be buggy than home-spun data structure code), speed of development (reuse a solid design and focus on using the tools available rather than re-implementing the tool for marginal benefits), and possibly speed (more below).

Reasons you might want to roll your own or choose a non-standard container include special APIs on a particular container that make it a better fit in your use case, interfacing with code you don't control (e.g., a third-party library expecting pairs in contiguous memory), and performance requirements.

To wit: Different variants of a map or array will have different performance characteristics. There is the Big-O rating for each operation (lookup, insertion, deletion, etc.), and that is a good place to start. But that doesn't tell the whole story. For one, you may not use all these operations with the same frequency or urgency (e.g., you need lookups to be fast and don't care as much about deletions). For another, cache locality can end up dwarfing the theoretical Big-O value for an operation, but it all depends on your data characteristics, target machine's cache size, etc.

An array will be in contiguous memory and will tend (depending on size and usage patterns) to have better cache locality and overall performance than a std::map which will tend to be spread out through memory. This is the same idea as a flat map, which has the interface of a map but the storage of an array underneath.

A hash-based map will have constant (O(1)) lookup, but it can also suffer from a penalty for computing the hash (depending on the key type, which may not even be hashable; or hashing could be literally free for some key types), won't store items in a strictly ordered fashion for iteration (not a problem in many use cases), could devolve to O(n) lookup time if the hasher has a lot of collisions, and depending on the implementation, may or may not be in contiguous memory that provides cache locality.

Of course you should guide your decisions with actual performance measurements rather than intuitions or guesses, which are notoriously inaccurate.

If you need a reasonable default, I'd start with std::unordered_map, assuming your key is cheaply and easily hashable and you don't care about iteration order. Choose a different data structure when that map doesn't provide all the features you need or when your performance analysis says you need to do so.

metal
  • 6,202
  • 1
  • 34
  • 49
2

Several points to note/consider:

  1. While Jejo's answer is true in terms of asymptotic complexity, in many practical scenarios - a pair of vectors (or a vector of pairs) may actually be faster than using a map - even for insertion or erasure. See this presentation-fraction by Herb Sutter extols the virtues, in practice, of using arrays (std::vector's) over other structures in various scenarios. It's a fun listen if you haven't had that drilled into you before...

  2. There are a lot more than two options for you to choose from - depending on your actual needs:

    • Do you need the map elements to be iterable in increasing order of the keys? i.e. do you need your data to be sorted? If not, then even on the map side you'll want std::unordered_map, not std::map.
    • If you go with arrays, you can either use an array of key-value pairs, or two arrays like you indicated. That too translates into different performance in practice (even if the asymptotic complexities will be the same).
    • You could use arrays with "tombstones" - marking "empty" or "missing" values - to indicate deleted elements, or to perform more clever insertion etc.
  3. Have a look at this question:

    vector or map, which one to use?

    which compares using a map with a vector of pairs.

  4. Standard-library maps are very slow as implementations! See this question:

    Is gcc std::unordered_map implementation slow? If so - why?

einpoklum
  • 118,144
  • 57
  • 340
  • 684