Initializing a std::map when the size is known in advance

Question

I would like to initialize a std::map. For now I am using ::insert but I feel I am wasting some computational time since I already know the size I want to allocate. Is there a way to allocate a fixed size map and then fill the map ?

Why not use the iterator or initializer list version of insert? Do you know the number of elements ahead of time but not their values? — Mankarse, Oct 24 '12 at 12:46
A great question to show the real nature of `std::map`. It's definitely more than "a kind of vector plus some magic". — Wolf, May 04 '16 at 13:21

score 54 · Accepted Answer · edited May 04 '16 at 13:30

54

No, the members of the map are internally stored in a tree structure. There is no way to build the tree until you know the keys and values that are to be stored.

edited May 04 '16 at 13:30

Wolf

9,679
7
62
108

answered Oct 24 '12 at 12:43

Bo Persson

90,663
31
146
203

At best, you could allocate all the nodes ahead of time. – Mankarse Oct 24 '12 at 12:44
2

@Mankarse: It wouldn't make much of a difference, since each node must be allocated (and deallocable) separately. There's really very little to be gained from advance knowledge of the contents of the map (unless it were completely constexpr). – Kerrek SB Oct 24 '12 at 12:50
1

@Mankarse: Doing that wouldn't save any time though. (Unlike with `vector.reserve()`, which does save time by avoiding much resizing and copying.) – j_random_hacker Oct 24 '12 at 12:50
3

You could create an exactly sized pool allocator, which would make the allocation of individual nodes significantly faster. For this to be really effective, however, you'd have to also know that the number of nodes will never change. If you know all this, it shouldn't be hard to write a custom allocator which would do the job. – James Kanze Oct 24 '12 at 12:55
4

This answer is borderline wrong - if you have eg. 100.000+ elements and you knew it up front - allocating everything in one contiguous array could dramaticly speed up processing. – darune Sep 23 '19 at 08:54

score 25 · Answer 2 · answered Oct 24 '12 at 14:13

25

The short answer is: yes, this is possible, but it's not trivial. You need to define a custom allocator for your map. The basic idea is that your custom allocator will set aside a single block of memory for the map. As the map requires new nodes, the allocator will simply assign them addresses within the pre-allocated block. Something like this:

std::map<KeyType, ValueType, std::less<KeyType>, MyAllocator> myMap;

myMap.get_allocator().reserve( nodeSize * numberOfNodes );

There are a number of issues you'll have to deal with, however.

First, you don't really know the size of each map node or how many allocations the map will perform. These are internal implementation details. You can experiment to find out, but you can't assume that the results will hold across different compilers (or even future versions of the same compiler). Therefore, you shouldn't worry about allocating a "fixed" size map. Rather, your goal should be to reduce the number of allocations required to a handful.

Second, this strategy becomes quite a bit more complex if you want to support deletion.

Third, don't forget memory alignment issues. The pointers your allocator returns must be properly aligned for the various types of objects the memory will store.

All that being said, before you try this, make sure it's necessary. Memory allocation can be very expensive, but you still shouldn't assume that it's a problem for your program. Measure to find out. You should also consider alternative strategies that more naturally allow pre-allocation. For example, a sorted list or a std::unordered_map.

answered Oct 24 '12 at 14:13

Peter Ruderman

12,241
1
36
58

1

I really agree with the last paragraph. Probably a sorted vector will be the most effective solution in this case. – Gorpik Oct 24 '12 at 14:17
Is there some hope for a standardised method for getting the node size or will we remain [dependent on estimates](http://stackoverflow.com/q/720507/2932052)? – Wolf May 04 '16 at 13:52
I can't see that happening. If you look closely at the definition of std::map, you'll see that it doesn't even promise to be implemented as a tree. – Peter Ruderman May 05 '16 at 15:54
1

The function `get_alloc` returns `allocator_type` (by value) which means that `myMap.get_allocator().reserve( nodeSize * numberOfNodes );` calls `reserve` on a temporary `MyAllocator` instance. Furthermore, knowing the tree node size of a map implementation is irrelevent because the allocator is used to allocate objects of type `std::pair` which means that you can perfectly know the size in advance. – Pixelchemist Mar 27 '17 at 16:58
This will depend on the map implementation, but I would expect most to combine both the key/value pair and related node info in a single object. In any case, if you want to do a single allocation for your map, then you must know _both_ the size of the data you intend to store plus the map overhead. It is true that get_allocator() returns a copy, which can be a nasty gotcha. Whether `myMap.get_allocator().reserve(...)` does what you'd hope depends on the allocator implementation. This whole scenario actually gets much easier with the new polymorphic allocators in C++17. – Peter Ruderman Mar 28 '17 at 18:28
@PeterRuderman You should also mention that depending on the system (I think most modern ones) - that there may already be optimizations in place that already does this - at least to some extent - even if the default map allocator doesn't do it. – darune Sep 23 '19 at 07:56

gast128 · Answer 3 · 2019-09-23T07:51:52.527

3

Not sure if this answers your question, but Boost.Container has a flat_map in which you can reserve space. Basically you can see this as a sorted vector of (key, value) pairs. Tip: if you also know that your input is sorted, you can use insert with hint for maximal performance.

edited Sep 23 '19 at 07:51

answered Feb 26 '16 at 09:51

gast128

1,223
12
22

Denis Ermolin · Answer 4 · 2012-10-24T12:49:54.613

2

You are talking about block allocators. But it is hard to implement. Measure before think about such hard things. Anyway Boost has some articles about implementing block allocator. Or use already implemented preallocated map Stree

edited Oct 24 '12 at 12:49

answered Oct 24 '12 at 12:44

Denis Ermolin

5,530
6
27
44

score 2 · Answer 5 · edited Jun 18 '21 at 13:42

There are several good answers to this question already, but they miss some primary points.

Initialize the map directly

The map knows the size up front if initialized directly with iterators:

auto mymap = std::map(it_begin, it_end);

This is the best way to dodge the issue. If you are agnostic about the implementation, the map can then know the size up front from the iterators and you moved the issue to the std:: implementation to worry about.

Alternatively use insert with iterators instead, that is:

mymap.insert(it_begin, it_end);

See: https://en.cppreference.com/w/cpp/container/map/insert

Beware of Premature optimization

but I feel I am wasting some computational time.

This sounds a lot like you are optimization prematurely (meaning you do not know where the bottleneck is - you are guessing or seeing an issue that isn't really one). Instead, measure first and then do optimization - repeat if necessary.

Memory allocation could already be optimized, to a large degree

Rolling your own block allocator for the map could be close to fruitless. On modern system(here I include OS/hardware and the C++ language level) memory allocation is already very well optimized for the general case and you could be looking at little or no improvement if rolling your own block allocator. Even if you take a lot of care and get the map into one contiguous array - while an improvement in itself - you could still be facing the problem that in the end, the elements could be placed randomly in the array (eg. insertion order) and be less cache friendly anyway (this very much depending on your actual use case though - I'm assuming a super large data-set).

Use another container or third party map

If you are still facing this issue - the best approach is probably to use another container (eg. a sorted std::vector - use std::lower_bound for lookups) or use a third party map optimized for how you are using the map. A good example is flat_map from boost - see this answer.

Conclusion

Let the std::map worry about the issue.
When performance is the main issue: use a data structure (perhaps 3rd party) that best suits how your data is being used (random inserts or bulk inserts / mostly iteration or mostly lookups / etc.). You then need to profile and gather performance metrics to compare.