As part of my test code I need to build complex structure that uses among other stuff 2 std::map instances; both of them have around 1 million elements. In optimized builds it's OK, however, in debug un-optimized builds it takes almost a minute. I use the same data to build that map, basically if I could save chunk of ram and restore it in 20ms then I'd effectively get the same map in my app without waiting a minute every time. What can i do to speed it up? I could try to use custom allocator and save/restore its alloacted storage, or is there a way to construct std::map from data that's already sorted perhaps so that it would be linear in time?
-
I would expect sorted data to require more re-balancing of the tree in the map and therefore be slower. Have you considered using a `vector` with `lower_bound` instead? – nwp Jun 03 '16 at 21:01
-
depending on your compiler, you can enable some level of optimization while preserving debugging information. what compiler? – kmdreko Jun 03 '16 at 21:01
-
More information about the context would be helpful. Why do you need to get rid of the map in the first place? You might be able to subvert that. – Jun 03 '16 at 21:18
-
@nwp I was thinking about rolling out my own "map" using `lower_bound` and sorted vector. Could be acceptable approach in my case – Pavel P Jun 03 '16 at 21:59
-
@vu1p3n0x I use visual studio 2015 (microsoft compiler). – Pavel P Jun 03 '16 at 21:59
-
Move the initialization function in another source file. Build that source with optimization enabled (but with the same definitions). Not sure if this is possible on MSVC. – sbabbi Jun 03 '16 at 22:11
-
Restoring from sorted input can indeed be O(N): http://en.cppreference.com/w/cpp/container/map/map (see (2)). – Alan Stokes Jun 03 '16 at 22:55
-
Even in a debug build you can turn on optimisation - if necessary only for the function building the map. `#pragma optimize`. – Alan Stokes Jun 03 '16 at 22:57
-
See http://stackoverflow.com/questions/8121428/c-serializing-a-stdmap-to-a-file and http://stackoverflow.com/questions/16075953/c-serialize-deserialize-stdmapint-int-from-to-file – Nick Matteo Jun 03 '16 at 23:35
-
@WilliamKappler I have complex structure where I need to get quick mapping, sorted unique order etc, in other words I want to use map, but to construct it takes long time – Pavel P Jun 04 '16 at 09:25
-
@sbabbi that's totally possible with MSVC and I've done that with other parts of code already. Perhaps, that's what I'll do here as well – Pavel P Jun 04 '16 at 09:26
-
@Kundor I'm pretty sure that loading from archive with boost::serialize will do the same step that makes my code slow: it will build the whole map – Pavel P Jun 04 '16 at 10:07
-
@sbabbi I tried, it obviously improved speed, but no where near as much as I get in regular release build. In debug build it takes 95 seconds to load my stuff, in release build it takes less than a second. When I moved init code to standalone file and compiled it with -DNDEBUG and full optimizations time went down to 23 seconds. I think it didn't go down to 1 second because multiple inline functions from different compilation units get merged and I end up with some unoptimized ones anyways. – Pavel P Jun 04 '16 at 10:11
-
@Pavel: I looked at the code and it does hinted insertion or emplacement, so loading should be O(n) instead of O(n log n). – Nick Matteo Jun 04 '16 at 12:30
-
@Kundor yes, perhaps it uses sorted input sequence construction of std::map. Which is O(N) instead of O(N*log N). In my case it makes it 20 times faster. – Pavel P Jun 04 '16 at 12:37
-
I ended up using custom allocator and creating ram sanpshot that I can restore. I added my own answer for that. – Pavel P Jun 05 '16 at 16:13
2 Answers
The technical difficulty, is that for std::map in debug mode, the Visual studio compiler inserts checks for correctness, and in some revisions, has inserted elements into the structure to ensure checking is easier.
There are 2 possible solutions :-
Abstraction
If the information provided by the std::map
is replaceable by an interface class, then the internals of the std::map can be hidden and moved into a separate compilation unit. This can be compiled outside of the debug environment and performance restored.
alternative data structure
For a piece of information which is broadly static (e.g. a static piece of data you need to retrieve quickly, then std::map
is not the fastest way to achieve this, and a sorted std::vector
of std::pair<key,value>
would be more performant in operation.
The advantage of the std::vector
, is there are guarantees about its layout. If the data is Plain-old-data, then it can be loaded by a std::vector::reserve
and a memcpy. Otherwise, filling the elements in the std::vector
would still avoid the significant time spent by Visual Studio tracking the memory and structure of the std::map
for issues.

- 12,614
- 3
- 28
- 50
-
I guess using a vector with alot of entries might cause bad_alloc. Also there is unordered_map. If it is completely static an Array can be put in rom. – Alexander Oh Jun 04 '16 at 07:09
-
regarding 1) - doesnt' seem to do the trick, see my comment at the top. It improves time, but not close to what I get in release build. 2) - vector+lower_bound would be faster, but I'm afraid that for similar reasons as in 1) I'd get slow load times anyways. Perhaps, I'd need to move init code to dll then 1) could be as fast as in release build – Pavel P Jun 04 '16 at 10:15
-
I tried to move code to dll - it won't work, internals of std containers become binary incompatible in release and debug builds. – Pavel P Jun 04 '16 at 10:39
-
-
@mksteve ok, I understand what you mean. In my case I actually need to access these "internals" that come with `std::map`: I need to iterate elements and do other stuff that would require me to provide alternative wrappers/interfaces for all that as well. – Pavel P Jun 04 '16 at 12:33
Eventually after trying different approaches I ended up using custom allocator.
std::map
was one of many containers that's used by my struct to hold the data. Total size of allocated ram was actually around 400MB, the struct contained lists, maps, vectors of different data where many of the members of these containers were pointers into other containers. So, I took radical approach and made it extremely fast to 'resurrect' my entire structure with all the maps and internal pointers. While originally in my post it was about making it 'fast' in debug builds quickly after modifying code and adding extra complexity it became equally applicable to release builds as well: construction time became around 10 seconds in release build.
So, at first I modified all structure members to use my custom allocator, this way I saw how much ram was actually allocated:
total allocated: 1970339320 bytes, total freed: 1437565512 bytes
This way I could estimate that I'll need around 600MB total. Then, in my custom allocator I added static global method my_aloc::start_recording() this method would allocate that 600MB chunk of ram and after start_recording was called my custom allocator would simply return addresses from that 600MB block. After start_recording was called I'd make a copy of my entire structure (it was actually a vector of structures). When making a copy there is no overallocation, each structure member allocated only enough ram for its storage. Basically by copying structs it actually allocated only around 400MB instead of 600MB. I said that internally in my construct there is lots of pointers to internal members, how to reuse then this 400MB recorded "snapshot" from my custom allocator? I could write code to "patch" pointers, but perhaps it wouldn't even work: I had lots of maps that also use pointers as key with custom compare struct that dereferences pointers to compare actual pointer to values. Also, some maps contained iterators into lists, it would be quite messy to deal with all that. Also, my overall structure isn't set in stone, it's work in progress and if something gets changed then patching code would also need to be changed. So, the answer was quit obvious: I simply needed to load that entire 400MB snapshot at the same base address. In windows I use VirtualAlloc, in linux something like mmap perhaps would need to be used, or alternatively boost shared mem lib could be used to make it more portable. At the end overall load time went down to 150ms, while in release I takes more than 10 seconds, and in debug builds it's already perhaps somewhere in minutes now.

- 15,789
- 11
- 79
- 128