std::map for small sparse collections

Question

Given a struct MyData, of which many instances exist (I'd say several millions at most), for each instance I need to store a member which may contain values for up to 8 keys. The key will always be an int ranged 0-7, and the values will always be a 3D point of floats (let's call it Point3).

At maximum, it would contain:

Key | Value  
-------------
0   | [x,y,z]  
1   | [x,y,z]  
2   | [x,y,z]
3   | [x,y,z]
4   | [x,y,z]
5   | [x,y,z]
6   | [x,y,z]
7   | [x,y,z]

However, in 99.9% of cases it will contain 0 or 1 key-value pairs, e.g.:

Key | Value
-------------
1   | [x,y,z]

How can I efficiently determine the memory overhead, if any, of storing an empty or single-valued std::map<int, Point3>, compared to always storing an array of 8 Point3 (4 bytes per float * 3 values * 8 slots = 96 bytes) and a single BYTE with bits for which slots contain meaningful values?

In generality, my question is what is the memory overhead of an empty or almost empty std::map?

use vector and get the best from both worlds. anyway linear search might be faster on this scale then bst search. as for the empty map size: couple of pointers, almost empty : implementation dependent. — Slava, May 27 '15 at 09:07
@Slava Thanks. I understand you mean a `std::vector` of `struct { int i; Point3 p; }`, yes? — Rotem, May 27 '15 at 09:09
You probably want to dive into your implementation of STL and see how std::map is implemented. This will potentially differ from one implementation to another. You can also experiment yourself and see what happens. — banach-space, May 27 '15 at 09:10
related: http://stackoverflow.com/questions/720507/how-can-i-estimate-memory-usage-of-stdmap — davidhigh, May 27 '15 at 09:11
see http://www.gotw.ca/publications/mill14.htm for memory overhead of std containers. I think an array or vector is more efficient in memory and speed. — hansmaad, May 27 '15 at 09:11
afaik, the memory overhead of a container is not dependent on the current size of the container, rather it would next power of 2 of the previous max size. — tejas, May 27 '15 at 09:12
@Rotem. yep. or maybe I would put Point3 first not to think about alignment — Slava, May 27 '15 at 09:13
@banach-space: The usual implementation is a red-black tree. That still leaves quite some implementation variation, but the big-O performance doesn't vary. — MSalters, May 27 '15 at 09:14
@Slava While unrelated to the question, could you please expand on your comment regarding alignment? My understanding is that in any case, it would be padded the same. Is this wrong? — Rotem, May 27 '15 at 10:23
@Rotem the key was "to not think". I actually do not want to think how Point3D will be aligned, and putting the big stuff first is always a safe option (unless you bother squeezing small things in between the big) — Slava, May 27 '15 at 16:06

MSalters · Accepted Answer · 2015-05-27T09:40:58.870

The memory overhead of a map isn't that bad. It's typically a few words per node. Using a map to start with would definitely be OK under the "no premature optimization" rule.

That said, when you do optimize, the map will be high on the list of data structures to replace. But at that point, you can profile all the different operations you actually use. How often do the keys and/or values change? That's crucial information to know before optimizing.

[edit] If I were to suggest a structure, it would be a vector of std::pair<int, Point3D>. The reason is that this probably gives alignment-friendly 16 byte objects. I wouldn't bother with sorting the keys, because that's only useful for the 0.1% nodes that do have multiple key/value pairs.

@Rotem: That''s an argument _against_ std::map. Map has a cheap insert (only need to add a node, and shuffle a few node pointers). Inserting into a sorted vector generally means shifting half the elements. (so O(log N) versus O(N)). — MSalters, May 27 '15 at 09:29

score 1 · Answer 2 · edited May 27 '15 at 09:29

1

Have a look at this blog post. There's a very thorough analysis of memory usage of different STL containers (including std::map) and different platforms/compilers are taken into account.

In the 64 bit STL that comes with Visual Studio 2010 (Visual C++ 10):

map uses 32 bytes for the map object itself, then each map node is 26 bytes bigger than the size of the contained object (likely 32 bytes after taking alignment and wastage into account). The number of map nodes required for a map is 1 more than the size of the map.

edited May 27 '15 at 09:29

Rotem

21,452
6
62
109

answered May 27 '15 at 09:26

banach-space

1,781
1
12
27

I've added a relevant quote from that link. – Rotem May 27 '15 at 09:29

std::map for small sparse collections

2 Answers2