1

I have a collection of objects, each of which has an identifier (string). Now I wonder whether I should store them as a vector or a map, so either std::vector<cObject> or std::map<std::string,cObject>. Each name can only be used once, so I need to check the vector/map for occurrences of a given name. Using a map, I would use find() which scales logarithmically, while for a vector, I would iterate and test oObject.name == newname until I find the name or arrive at the end, which scales linearly. A downside of the map is that the name is stored twice, once inside the object and once as the key.

While for large vectors/maps, a map would always win, I wonder at which point this is the case, since a map seems to be overkill if I only have up to 10 objects to store.

This leads to my question: At which point does a map become advantageous (regarding performance) compared to a plain vector? Should I already consider using maps if the expected number of objects is about 10, 100, 1000 or even larger? I admit that this question is rather vague, but I hope to get some advice anyway, to get a feeling when to use which container.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
Fabian
  • 4,001
  • 4
  • 28
  • 59
  • 9
    In general, it is better to actually profile your code instead of guessing numbers. – lisyarus Jun 22 '15 at 12:43
  • why don't you write a small benchmark test and compare the runtime of each version? – m.s. Jun 22 '15 at 12:43
  • 1
    There's nothing stopping you from sorting your vector so you can use a better searching algorithm. If the lookup speed is the most important thing then an unordered set/map might be the best choice. I don't feel that size is a valid factor for which container to use, you should just use the right one for the job. – Retired Ninja Jun 22 '15 at 12:45
  • for searching map are good but it take extra memory. Vector will not store extra string but it may take time to search. For worst case it will check whole vector. – DigviJay Patil Jun 22 '15 at 12:45
  • 1
    It is very processor (i.e. cache!) & compiler & optimization & `libstdc++` specific, but I was surprised to see a video by Herb Sutter mentioning a threshold of several hundreds (below which a sequential search in a vector is faster than using a tree) – Basile Starynkevitch Jun 22 '15 at 12:48
  • 1
    If your key and value are the same, then use a [`set`](http://en.cppreference.com/w/cpp/container/set) instead of a `map`. – Praetorian Jun 22 '15 at 12:52
  • 3
    "A downside of the map is that the name is stored twice, once inside the object and once as the key." Well, then you can use a **set** of objects and use the string inside them in the implementation of their equality operator. – Daniel Daranas Jun 22 '15 at 12:53
  • @Praetorian It seems that the key and value are not the same, but the key is _contained_ in the value. It is one of its fields. The recommendation is still valid, of course. – Daniel Daranas Jun 22 '15 at 12:55
  • I think any answer you get will be highly dependent upon the size of the value objects apart from everything else, because that will determine how often your vector search will miss the cache in longer sequences. And then you could greatly improve on the vector implementation if you kept it sorted. The permutations for performance are endless. Test it. – Robinson Jun 22 '15 at 13:08
  • @DanielDaranas Unfortunately, that makes the *entire objects* immutable, not just the key part. I actually consider this a grave error in C++ library design, but it's a fact. – Angew is no longer proud of SO Jun 22 '15 at 13:30
  • @Angew You could use the mutable keyword - see this [question](http://stackoverflow.com/questions/7340434/how-to-update-an-existing-element-of-stdset). – Daniel Daranas Jun 22 '15 at 13:59
  • 2
    @DanielDaranas The problem witht that is that since C++11, the standard library has the right to assume that `const` operations are thread-safe. This means all access to `mutable` members must be internally synchronised. – Angew is no longer proud of SO Jun 22 '15 at 14:17

3 Answers3

3

It is hard to predict the size at which map is advantageous then vector. But there are few points which you can consider:

  1. Maps are basically implemented as binary tree whereas vectors are implemented as arrays. So it is compartively easier to iterate through array for lets says when there are 10 to 12 elements as it would be a liner search.
  2. The point that maps are faster than the vectors depends on the implementation on the processor and also what data you are trying to store in it. My best guess is that maps would be faster will be for 5-20 elements.(But it is just a guess I would suggest you to create a benchmark for yourself)
  3. "By default, use vector when you need a container" - Bjarne Stroustrup.

You may also find this C++ Containers Cheat Sheet useful

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
1

That depends on how the vector and map are implemented, and what operations your code does on them (e.g. adding elements to the middle or end, removing elements, adding and removing repeatedly, etc). You would need to test and profile your code. Even them, the answer would depend on the host system (e.g. caching, pipelining, etc).

Incidentally, if you keep a vector sorted, finding an element also scales logarithmically (e.g. using binary search). Your basis of comparison (vector linear, map logarithmic) is therefore flawed.

Peter
  • 35,646
  • 4
  • 32
  • 74
1

Using a map, I would use find() which scales logarithmically, while for a vector, I would iterate and test oObject.name == newname until I find the name or arrive at the end, which scales linearly.

Not quite true, you can keep your objects in the vector sorted and use std::binary_searchwith logarythmic complexity.

A downside of the map is that the name is stored twice, once inside the object and once as the key.

You can use std::set with custom comparator so you will not have to store key separately.

In general Knuth said "premature optimization is root of all evil". Make your program readable, use what's easier (std::map or std::unordered_map I guess) and later if you have performance issue with this container optimize it. Encapsulating this container in some class could be helpful, so later you can replace implementation transparently for the rest of your code.

Slava
  • 43,454
  • 1
  • 47
  • 90