I have a very large list of items (~2 millions) that I want to optimize for access speed. I iterate trough the items using an iterator (++it).
Right now the code is implemented using std:map<std::wstring, STRUCT>
.
I wonder if it's worth to change std::map with a std::deque<std::pair<std::wstring, STRUCT>>
. I think I would have advantage of using pointer arithmetic and minimize cache miss. It worths ?
I know that profiling is the answer but I need an opinion before implementing this ...

- 5,503
- 3
- 36
- 56
-
4How do you access the items? If you are just iterating through them, then a map seems the wrong structure, so presumably you are doing some lookup? Could you add more detail to the question? – Jeff Foster Apr 20 '11 at 08:42
-
I am searching for some values in the STRUCT. I'm iterating a lot and I do-it using iterators (++it), but I can probably switch to std::for_each although i don;t think is a big advantage – cprogrammer Apr 20 '11 at 09:08
-
What Jeff means is: why did you choose a map in the first place? What **else** do you do besides iterating? – Karl Knechtel Apr 20 '11 at 09:59
4 Answers
If you know in advance the size, then std::Vector is clearly the way to go it your objects aren't too big.
std::vector<Object> list;
list.reserve(2000000);
And then fill it as usual.
This is the fastest and least memory consuming approach. However, you need to be able to allocate enought continous memory. But excepted if your object are 1kb big, it shouldn't be a problem.

- 9,601
- 3
- 34
- 50
-
1
-
i don't know the size in advance. That's why i was thinking at deque. The size of the STRUCT is a little more more than 1k – cprogrammer Apr 20 '11 at 09:04
-
1@viktor you need 2Gb of continous. Can be a problem indeed. However, 1kb data structure is rather rare (probably means you're abusing of C-arrays instead of vectors); @cprogrammer the deque is the way to go as everyone stated – Tristram Gräbener Apr 20 '11 at 09:13
-
in practice, even for a few bytes it can become a problem. Trying to allocate several MB of data at once may not always succeed. That said, sometimes I'd love to be able to customize the bucket size of the deque to fill a 4KB block. – Matthieu M. Apr 20 '11 at 09:39
-
@Mattieu: "Trying to allocate several MB of data at once may not always succeed", yes, lets remember the 90's... – Viktor Sehr Apr 21 '11 at 16:05
With deque, you would lose ( or would have to re-implement ) the advantage of Key-Value pairs. If it's not essential for your data, I would consider using deque.

- 3,469
- 26
- 38
-
1Why? If the objects are sorted by their keys, binary search can be used for lookup, so OP won't be losing anything :) – Apr 20 '11 at 08:51
-
2@Grigory: on the contrary, `binary_search`, `lower_bound`, `upper_bound` etc... are normally faster than their `std::map` cousins because of the contiguity of memory chunks (depends on `deque` implementation). So it should not be slower, and could be faster. – Matthieu M. Apr 20 '11 at 09:38
-
Generally, if you're only doing search in this set (no insertions/deletions), you're probably better off using a sorted sequential cointainer, like deque or vector. You can then use simple binary search to find the needed elements. The advantage of using a sequential container is that it is better in terms of memory usage, has very simple implementation, and provides better locality of reference. I'd write one version of the code using vector, and another version of the code using deque, then compare them in terms of preformance to decide which one to use in the final version.
However, if your structure needs to be updated (new elements need to be inserted or old elements have to be deleted frequently), map is better choice. Or maybe, you just have to drop STL containers altogether and just use an in-memory database (see SQLite), but it highly depends on what problem you're solving.
-
i don't need to search an element, and once filled with data only read-operations are performed. Unfortunately I can't use advantage of sorting data because sorting by key is not important and by STRUCT is not possible (I am interested in more than one values in the struct) – cprogrammer Apr 20 '11 at 09:16
-
Why isn't it possible to sort by the fields of struct? Say you're interested in fields A, B and C of your structure. Write an operator<() for your struct that would compare two structs first by A, then by B, then by C. – Apr 20 '11 at 09:21
The fastest container to iterate through is usually a vector
, so if you want to optimize for iteration at the expense of everything else, use that.
Overall app performance of course will depend how many times you iterate, and how you construct your data in the first place. For a simple test, once your map has been populated you can construct a vector from it as follows:
vector<pair<K,V> > myvec(mymap.begin(), mymap.end());
Where K and V are the key and value types of the map. Then just use the vector iterators in place of the map iterators and compare performance.
Of course, if you want to modify the map in future, then normally it would not be appropriate to optimize for iteration at the expense of everything else.

- 273,490
- 39
- 460
- 699