Minimizing memory overhead using C++ containers (std::map and std::vector too expensive)

Question

I am expecting to be handling a huge number of data records, whereby around 20 uint8_t keys will have millions of <int, struct> pairs associated with each of them (ordered by int). These pairs are rather lightweight at ~10 bytes, and need to be allocated dynamically.

Initially, I was using a std::map<uint8_t, std::vector<int, struct>> but after studying the overhead associated with vectors, namely the capacity() in

3 machine words in total + sizeof(element) * capacity()

as seen here; capacity() "typically has room for up to twice the actual number of elements" which is seemingly detrimental.

Instead of a vector, I could use a std::map, however the overhead of ~32 bytes per node also becomes very expensive for such light weight pairs.

I am unfamiliar with Boost and other C++ libraries, so was wondering whether anyone could advise on a solution where I could avoid manual dynamic memory allocation?

Edit : To clarify following a few questions in comments, the struct stored will contain 3 shorts (to start with), and no further data structures. I anticipate the length of the vector to be no greater than 1.5*10^8, and understand this comes to ~1.4 GiB (thanks @dyp).

I suppose the question is rather, how to manage vector capacity() such that reallocation through reserve() is kept to a minimum. I am also unsure of the efficiency of shrink_to_fit() (C++11)

What is in struct? What is the typical size of vector? Does it have a maximum? Is it known at compile-time? — Neil Kirk, Nov 16 '14 at 21:40
@πάνταῥεῖ Isn't that only a suggestion of *at least* this number of elements? — dyp, Nov 16 '14 at 21:43
You can roll out a "flat map" which is `std::vector>` and by sorting the vector before operations, you can use `std::lower_bound` and similar functions to perform binary search. There may be a premade such structure in boost. — Neil Kirk, Nov 16 '14 at 21:43
*Millions of pairs of 10 bytes* sounds like several dozen megabyte to me. 20 of them can still be of the order of hundreds of megabyte up to some gigabyte. Is this really a problem for the target machine? — dyp, Nov 16 '14 at 21:43
@neil-kirk The struct will contain roughly 3-5 shorts. I expect the vector size to be no greater than 1.5*10^8 elements long (max), however the maximum size could be less and yes I could determine this but not at compile-time. The typical size is unknown at the moment. — PidgeyBAWK, Nov 16 '14 at 21:46
The answer you've quoted is correct, but may be misleading you a bit. He's basically saying that capacity *could* be up to double the current size. That can only happen if the growth factor is at least 2, and even then happens only immediately after reallocation. A more typical case would be a growth factor of 1.5, and the added area around half full, so wasted area of ~25%. I've never seen growth factor greater than 2. Equal to 2 was once common, but 1.5 seems typical in current implementations. — Jerry Coffin, Nov 16 '14 at 21:47
@PidgeyBAWK So we're talking about a minimum memory consumption (for the worst case) of about 1.5*10^9 byte, i.e. 1.4 GiB? — dyp, Nov 16 '14 at 21:49
What does "roughly" mean? What I wanted to know is if it contains any data structures of its own. — Neil Kirk, Nov 16 '14 at 21:50
@JerryCoffin The libstdc++ implementation of `vector` (GCC 4.8.2) actually does double the capacity each time. — Fred Foo, Nov 16 '14 at 21:50
Since the additional reserved elements in a vector are never accessed, they do not require any committed memory (only address space). I expect there'll be no problem in a typical StdLib implementation if you have enough physical memory for the actually used elements (if you reserve and don't use the automatic growth feature). — dyp, Nov 16 '14 at 21:53
If you resize an empty vector to X, typically the compiler will only allocate memory for X and no more. I know this is implementation defined. I tested with VS2013 and ideone.com — Neil Kirk, Nov 16 '14 at 21:53
@NeilKirk Sorry for not clarifying. It wont contain any data structures, simply 3 shorts. — PidgeyBAWK, Nov 16 '14 at 21:54
@dyp I think you are off by an order of magnitude. Pidgey, so if we have 10 million elements then minimum memory consumption is approximately 220mb in any case. Is that acceptable? — Neil Kirk, Nov 16 '14 at 21:56
@dyp Yes, that is correct. You're right as well regarding the system, it can handle this kind of memory. However as I move forwards I intend to increase this workload, and want to scrutinise the memory performance before stepping up. — PidgeyBAWK, Nov 16 '14 at 21:57
Thinking about this further, the problem cannot be the map, as there are a maximum of 256 keys, which is not many. The huge memory usage must come from the value and not from the map itself. — Neil Kirk, Nov 16 '14 at 22:02
Even though map isn't the problem, I would consider using `std::vector`, where the vector is 256 in size, one for each key, and empty keys are just left with an empty value. This will cost a constant amount of approximately 3kb, which is chicken feed, and will have faster lookup. — Neil Kirk, Nov 16 '14 at 22:03
@NeilKirk Good point, I will definitely be switching that out. — PidgeyBAWK, Nov 16 '14 at 22:11
What are the ints in the vectors? Are they sequential? Can you do without them after sorting the items per vector? Can any of the other information be thrown away after reading data in, or is the structure truly dynamic? — Fred Foo, Nov 16 '14 at 22:48
@larsmans The ints are offsets, sequential yet not uniform in any way (well, from what I expect). As Neil suggested, I can discard the uint8_t keys however the pairs will be allocated dynamically (they are derived from input data). So far, I expect to be using `vector>>`, and will need to simply consider best ways to avoid the 2x capacity(), as you mentioned. — PidgeyBAWK, Nov 16 '14 at 23:00

frasnian · Accepted Answer · 2014-11-17T01:51:10.323

Following up on @NielKirk's point about std::vector<> instead of a map for the keys, with only 256 possibilities you could also consider std::array<> (or even a C-style array) for the keys.

As for the std::pair<int, struct> elements, an initial implementation had them as members of a std::vector<std::pair<int, struct>> collection, and you said

Instead of a vector, I could use a std::map, however the overhead of ~32 bytes per node also becomes very expensive for such light weight pairs.

which implies the int part of the element is unique as you did not mention std::multimap. You could take a look at Google sparsehash (http://code.google.com/p/sparsehash/). From the project home page:

An extremely memory-efficient hash_map implementation. 2 bits/entry overhead! The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed.

These hashtable implementations are similar in API to SGI's hash_map class and the tr1 unordered_map class, but with different performance characteristics. It's easy to replace hash_map or unordered_map by sparse_hash_map or dense_hash_map in C++ code.

I've used it before, and never had a problem with it. Your uint8_t keys could index into a (std::vector/std::array/C-array) collection KCH of hashmaps. If you wanted to you could even define KCH as collection of objects, each containing a hashmap, so each KCH[i] can implement a convenient interface for working with std::pair<int, struct> objects for that key. You'd have a "bad key" element as a default for non-key elements in the collection referencing either a) a single empty dummy hashmap or b) a "bad key object" that handles an unexpected key value appropriately.

Something like this:

typedef std::pair<int, struct>                            myPair;
typedef google::sparse_hash_map<int, myPair>              myCollectionType;
typedef google::sparse_hash_map<int, myPair>::iterator    myCollectionIter;

myCollectionType dummyHashMap;
std:array<myCollectionType, 256> keyedArray;

Initialize all keyedArray elements to dummyHashMap, then fill in with different hash maps for valid keys.

Similarly, with containing objects:

class KeyedCollectionHandler {
public:
    virtual bool whatever(parm);
    ...

private:
    myCollectionType collection;
};

class BadKeyHandler : public KeyedCollectionHandler 
{
public:
    virtual bool whatever(parm){
        // unknown or unexpected key, handle appropriately
    }
    ...
};

BadKeyHandler badKeyHandler;

Initialize 256 keyed array elements to badKeyHandler, fill in KeyedCollectionHandler objects for good key values.

Thanks for this brilliant and thorough answer - `sparsehash` seems to fit the bill. Did you compare the performance of `sparsehash`with the supposedly more efficient `dense` implementation? — PidgeyBAWK, Nov 18 '14 at 14:54
Thank you for the gracious response. No, I did not compare sparse_ to dense_hash_map for performance. For our needs, space considerations were not an issue, it was performance. For what we needed, dense_hash_map just crushed std::map for performance. I would expect the sparse_ implementation to impress as well. — frasnian, Nov 18 '14 at 20:24
This page might also be helpful - it's got a link to some performance numbers as well as the implementation details: http://sparsehash.googlecode.com/svn/trunk/doc/index.html — frasnian, Nov 18 '14 at 20:38
@frasnian That link is brilliant - thanks. It's re-assuring to see how similar the map_fetch times are, now just to benchmark my growth performance... — PidgeyBAWK, Nov 22 '14 at 21:27

Minimizing memory overhead using C++ containers (std::map and std::vector too expensive)

1 Answers1