8

For embedded system applications where memory usage is more of a concern than speed, what would be the best map container to use? std::map, std::unordered_map? This would be for situations where N is less than, say, a hundred.

If implementation matters, then I'm concerned with the libstdc++ implementation (GCC).

While I understand that it's impossible to beat a simple array for memory usage, I want to avoid a data structure that has O(N) performance. So while I want to reduce memory footprint, I also want the lookup speed to be reasonable (better than O(N)). I don't care about other operations (insertions, removals), as they will occur infrequently.

If I wanted to make my own memory usage measurements, how would I go about doing this on a Linux platform?

Would boost::flat_map be suitable as an associative container with small footprint and lookup times better than O(n)?

Emile Cormier
  • 28,391
  • 15
  • 94
  • 122
  • 3
    Before you all start rushing in to close this question as a duplicate, please note that I'm concerned with **memory usage**, not speed. I could not find a question or answer that addresses memory usage. – Emile Cormier Aug 22 '15 at 20:11
  • 4
    if memory usage is a concern, then use an array (either a primitive array or `std::array`.) – The Paramagnetic Croissant Aug 22 '15 at 20:12
  • Hmm, yes, it seems pretty obvious now that there's no way a tree or hash-based map could possibly use less memory than a simple array. I'll revise my question a bit. – Emile Cormier Aug 22 '15 at 20:18
  • `O(n)` for what operations? If you fill an array and sort it, searching will then be faster than a map. – Bo Persson Aug 22 '15 at 20:26
  • @EmileCormier Specifically which operations' complexity do you want to be better than `O(N)`? A sorted array can perform lookup in `O(log N)`, insertion and deletion in `O(N)`. An open-addressing hash table can achieve amortized `O(1)` insertion and lookup, and if implemented properly, deletion, although for practical purposes, it's advisable to keep the load factor below around 2/3. – The Paramagnetic Croissant Aug 22 '15 at 20:27
  • As you were typing those comments, I was just now remembering that I can simply sort the vector for O(log N) lookup performance. Thanks for the help, I think I'll be deleting this question. – Emile Cormier Aug 22 '15 at 20:28
  • 7
    Don't delete it. It's a useful discussion for other people to see in the future. – Donnie Aug 22 '15 at 20:31
  • (Although you might wan to post an answer and accept it when you can so it stops getting sorted up looking for an accepted answer) – Donnie Aug 22 '15 at 20:31
  • @Donnie Okay, I'll leave my temporary lapse in Data Structures 101 here as an example for others. ;-P – Emile Cormier Aug 22 '15 at 20:36
  • We've all been there. I have an embarrassing Java question laying around too. – Donnie Aug 22 '15 at 20:37
  • You guys go ahead an answer. I'm now curious to know if `boost::flat_map` somehow uses sorting to improve lookup times. – Emile Cormier Aug 22 '15 at 20:39
  • 2
    A `map` is typically implemented as a binary tree (rb-tree), this means 2 pointers overhead per element, plus the overhead of one allocation per element. It has parent pointers, too, and a color. An `unordered_map` is typically implemented as an array of linked lists, which implies one pointer overhead per element plus the array of pointers (which should be about as big as the number of elements in the un_map) plus allocation overhead per element and for the array. An `unordered_map` typically contains some empty buckets. – dyp Aug 22 '15 at 20:40
  • I updated the question to indicate I'm concerned only with lookup performance. – Emile Cormier Aug 22 '15 at 20:46
  • 2
    As containers support custom allocators, you could provide a custom allocator which measures the memory consumption (in addition to the `sizeof` the map object itself). – dyp Aug 22 '15 at 20:48
  • This question is completely unanswerable. The only possible way to answer the question is to profile the implementation you're using, which is not possible for anyone else. – Puppy Aug 22 '15 at 21:00
  • @Puppy I've constrained the question to GCC. I'm sure there are plenty others who use it and who might be interested. Did dyp not suggest a means to measure the memory consumption? – Emile Cormier Aug 22 '15 at 21:06

4 Answers4

6

For small N, a sorted vector is often the fastest container, and has minimal memory footprint. boost::flat_map is an implementation, others call it AssociativeVector. You can use std::lower_bound to implement it easily. Since the data has to be sorted, insert and delete operations are O(n) since the data has to be shifted in the vector.

With a sorted vector, you can do a binary search which is O(log N) for retrieval. The complexity of std::lower_bound is:

The number of comparisons performed is logarithmic in the distance between first and last (At most log2(last - first) + O(1) comparisons)

Jens
  • 9,058
  • 2
  • 26
  • 43
  • If you can clarify that a sorted vector is fast *for retrievals*, I'll accept this answer. – Emile Cormier Aug 22 '15 at 23:20
  • Note that for extremely small N, [it's faster to use std::find than std::lower_bound](https://stackoverflow.com/q/8784732/86436). – splicer May 02 '20 at 19:53
3

As an alternative my answer on vector, you could also consider a hash-map if you know the keys in advance. You could use gperf to generate a perfect hash function, and then have O(1) lookup. The real speed depends on the key type. I've seen cases where a std::map<string> outperformed a std::unordered_map<string> because the hash function for strings has to process the complete key string while the map comparison function often stops after the first few characters.

If speeds matters, implement both and benchmark. For heap profiling, I've used google perftools in the past.

Community
  • 1
  • 1
Jens
  • 9,058
  • 2
  • 26
  • 43
  • Not only is this O(1) lookup, but it could reduce the memory consumption for a hash map from 2 pointers plus one allocation per value (`unordered_map`) to to either 1 value per key, or 1 value plus 1 bool per key (`optional`). – dyp Aug 22 '15 at 20:57
  • While I'm not *that* constrained for memory and speed, this is useful to know for situations where I might be. – Emile Cormier Aug 22 '15 at 21:00
3

In addition to a sorted vector (like boost::flat_map) you may also have a look at google::dense_hash_map, which is a hash map storing the data contiguously opposed to std::unordered_map, which allocates nodes separately. The price is the need for a dedicated empty key value and lack of iterator invalidation guarantees. There is also an even more compact google::sparse_hash_map (although slower).

Ilya Popov
  • 3,765
  • 1
  • 17
  • 30
2

If insertions and deletions occur infrequently, then instead of std::map or std::unordered_map, why not use a sorted std::vector and std::lower_bound or std::equal_range for lookup - you get ln(N) speed for the lookups and minimal space. boost::flat_map provides a nice associative container API that is implemented in precisely this way. Some detailed benchmarks are provides here and here. Another option would be an open-address hash-table. You'd have to do some experimenting to see whether this would satisfy your memory constraints and still give you faster lookup than ln(N). Both solutions give good locality but if you want zero memory overhead and ln(N) is fast enough for lookups, then I'd go with the sorted array.

Community
  • 1
  • 1
sfjac
  • 7,119
  • 5
  • 45
  • 69
  • If you can add a blurb about `boost::flat_map` having fast lookups and being implemented like a sorted `std::vector`, I'll accept this answer. – Emile Cormier Aug 22 '15 at 21:25