6

There is the boost.container flat_map and others, and the Loki AssocVector and many others like these which keep the elements sorted.

Is there a modern (c++11 move-enabled, etc.) implementation of an unsorted vector adapted as a map/set?

The idea is to use it for very small maps/sets (less than 20 elements) and with simple keys (for which hashing wouldn't always make sense)

onqtam
  • 4,356
  • 2
  • 28
  • 50
  • I'm not sure exactly what you want. Why is it important that it's stored flat i.e. in a vector? To use an unsorted vector as a map/set you need some sorted index, do you plan to keep that external to the container? – Jonathan Wakely Jun 19 '15 at 12:22
  • how about unordered_map? – BeyelerStudios Jun 19 '15 at 12:23
  • @JonathanWakely The elements in the vector can be pairs of the key and the value - I don't need a separate index. – onqtam Jun 19 '15 at 12:29
  • 2
    @onqtam, but then your keys would be unsorted and you have to do linear searches, and erasing elements still requires shuffling elements around. If that's acceptable, just use `std::vector` (see my answer). – Jonathan Wakely Jun 19 '15 at 12:31
  • 1
    If you are concerned about cache locality you should keep the keys and the values in different vectors unless everything fits into L1 cache. – nwp Jun 19 '15 at 12:51
  • 2
    @onqtam Perhaps you need analog of "unordered_map"? Or just vector with O(n) lookup? I guess what are you asking for is [hash table with open addressing](https://en.wikipedia.org/wiki/Hash_table#Open_addressing) - i.e. hash table which stores all elements within array like `vector>`. – Evgeny Panasyuk Jun 19 '15 at 15:29
  • @EvgenyPanasyuk I don't think a hash map fits my needs - i want almost no allocations. Also for keys simple as an int I wouldn't need hashing them to compare them... – onqtam Jun 19 '15 at 17:07
  • 1
    @onqtam there are different kinds of hash maps, read the link above - with open-addressing all hashtable's elements reside within single array. – Evgeny Panasyuk Jun 19 '15 at 17:48
  • hashing will totally make sense, how do you intend to map random values (`int` can still reach 4 billion) to 0-20 range ? If you answer "modulo" this is hashing, and a bad one. – v.oddou Jun 22 '16 at 09:12

5 Answers5

7

Something like this?

template<class Key, class Value, template<class...>class Storage=std::vector>
struct flat_map {
  struct kv {
    Key k;
    Value v;
    template<class K, class V>
    kv( K&& kin, V&& vin ):k(std::forward<K>(kin)), v(std::forward<V>(vin)){}
  };
  using storage_t = Storage<kv>;
  storage_t storage;

  // TODO: adl upgrade
  using iterator=decltype(std::begin(std::declval<storage_t&>()));
  using const_iterator=decltype(std::begin(std::declval<const storage_t&>()));
  // boilerplate:
  iterator begin() {
    using std::begin;
    return begin(storage);
  }
  const_iterator begin() const {
    using std::begin;
    return begin(storage);
  }
  const_iterator cbegin() const {
    using std::begin;
    return begin(storage);
  }
  iterator end() {
    using std::end;
    return end(storage);
  }
  const_iterator end() const {
    using std::end;
    return end(storage);
  }
  const_iterator cend() const {
    using std::end;
    return end(storage);
  }
  size_t size() const {
    return storage.size();
  }
  bool empty() const {
    return storage.empty();
  }
  // these only have to be valid if called:
  void reserve(size_t n) {
    storage.reserve(n);
  }
  size_t capacity() const {
    return storage.capacity();
  }
  // map-like interface:
  // TODO: SFINAE check for type of key
  template<class K>
  Value& operator[](K&& k){
    auto it = find(k);
    if (it != end()) return it->v;
    storage.emplace_back( std::forward<K>(k), Value{} );
    return storage.back().v;
  }
private: // C++14, but you can just inject the lambda at point of use in 11:
  template<class K>
  auto key_match( K& k ) {
    return [&k](kv const& kv){
      return kv.k == k;
    };
  }
public:
  template<class K>
  iterator find(K&& k) {
    return std::find_if( begin(), end(), key_match(k) );
  }
  template<class K>
  const_iterator find(K&& k) const {
    return const_cast<flat_map*>(this)->find(k);
  }
  // iterator-less query functions:
  template<class K>
  Value* get(K&& k) {
    auto it = find(std::forward<K>(k));
    if (it==end()) return nullptr;
    return std::addressof(it->v);
  }
  template<class K>
  Value const* get(K&& k) const {
    return const_cast<flat_map*>(this)->get(std::forward<K>(k));
  }
  // key-based erase: (SFINAE should be is_comparible, but that doesn't exist)
  template<class K, class=std::enable_if_t<std::is_converible<K, Key>{}>>
  bool erase(K&& k) {
    auto it = std::remove(
      storage.begin(), storage.end(), key_match(std::forward<K>(k))
    );
    if (it == storage.end()) return false;
    storage.erase( it, storage.end() );
    return true;
  }
  // classic erase, for iterating:
  iterator erase(const_iterator it) {
    return storage.erase(it);
  }
  template<class K2, class V2,
    class=std::enable_if_t<
      std::is_convertible< K2, Key >{}&&
      std::is_convertible< V2, Value >{}
    >
  >
  void set( K2&& kin, V2&& vin ) {
    auto it = find(kin);
    if (it != end()){
      it->second = std::forward<V2>(vin);
      return;
    } else {
      storage.emplace_back( std::forward<K2>(kin), std::forward<V2>(vin) );
    }
  }
};

I left the container type as a template argument, so you can use a SBO vector-like structure if you choose.

In theory, I should expose a template parameter for replacing equals on the keys. I did, however, make the key-search functions transparent.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • @nwp nope, the question asks for an unsorted vector, you misread. I use the same vector for keys and values, because keeping things contiguous is usually worth the cost of a bit of alignment. – Yakk - Adam Nevraumont Jun 19 '15 at 13:26
  • 2
    Wow! Did you just write that? It worked after fixing a typo in ```is_converible```. It also worked with a boost SBO container ```template using mySmallVec = boost::container::small_vector;```. What else is missing from this class? Would you consider making it public in a repo somewhere? – onqtam Jun 19 '15 at 16:20
  • 1
    I was just wondering why aren't other containers (like ```boost::container::flat_map```) offering the inner container type as a template parameter... would be so useful! – onqtam Jun 19 '15 at 16:31
  • 1
    If I use a ```boost::container::static_vector``` i could even just memcpy the whole map (if the key and value are PODs) - I think it doesn't use pointers internally (but i'd have to check that...) – onqtam Jun 19 '15 at 16:43
  • 1
    Why didn't you use a pair and instead defined your own struct for key/value pairs? Where you've written ```// TODO: SFINAE check for type of key``` did you mean to use the ```enable_if``` stuff like for ```erase```? What is the purpose of ```set``` - a more optimal way than using ```map.[key] = rvalue```? If so, why doesn't ```std::map``` have an analogy of this? And isn't emplace superior to this ```set``` because it forwards variable arguments? – onqtam Jun 19 '15 at 16:57
  • The use of `std::pair` in `std` in any interface component is widely considered a mistake. `first` and `second` are horrible names for almost anything, and especially for key/values. `set` is a lazier `insert` or `emplace`, and it avoids default-constructing then assigning-to that `[]` forces. SFINAE checks are the `enable_if` stuff, yes (prevent invalid overloads from being considered, a QOI issue). I'm sure there are other things missing: I haven't written a `flat_map` that didn't auto-sort before, and usually I sketch the core and add methods as I need them. – Yakk - Adam Nevraumont Jun 19 '15 at 17:01
  • @Yakk, very nice. Since it is unordered, `erase` could be optimized by bringing the last element to the erasing point instead of shifting all elements (as it would happen for `storage == std::vector`). Yesterday, I asked a similar question http://stackoverflow.com/questions/43218715/implementation-of-a-contiguous-flat-unordered-container?noredirect=1#comment73508843_43218715. – alfC Apr 05 '17 at 15:27
  • @alfC Yes, my implementation above was based off a container that attempted to maintain a sort order in a lazy way. And erase was far less common than "block of inserts" followed by "lots of access", so having erase maintain invariants was lazy. – Yakk - Adam Nevraumont Apr 05 '17 at 17:59
  • @Yakk, thanks for the clarification. About `std::pair`, you make an interesting point. I think I agree with you that the class `kv` is effectively an implementation detail, however not the members of `kv` because you have to use them often in the client code. For example `auto it = find(...); auto val = it->v; auto key = it->k`. Simply for that reason I would use `key/value` as members, not `k/v`, (the later is worst than `first/second`). – alfC Apr 05 '17 at 21:27
4

If the sets are sure to be small then you can just use a std::vector (or std::deque) and do lookup using linear searches. An O(n) linear search over a small vector can be faster than an O(log(n)) search over a more complicated structure such as a red-black tree.

So you could just put elements in a vector without sorting them. You would still need to do some shuffling if you remove elements, but that will always be true for a flat vector-like structure whether it's sorted or not, unless you only ever remove the back element. Why is it important that it's flat anyway?

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • It's important that it's flat to avoid allocations and to have cache locality. – onqtam Jun 19 '15 at 12:32
  • Then you have to compromise on shuffling. You can't have both, unless you don't actually erase elements just somehow mark them as dead, and then re-use a dead element on the next insert. – Jonathan Wakely Jun 19 '15 at 12:36
  • 1
    @JonathanWakely Removing element from unsorted vector is O(1) - just swap it with `.back()` and then `.pop_back()`. – Evgeny Panasyuk Jun 19 '15 at 15:23
  • dont drink the cache locality kool aid... unless you can profile your code to the level of cache misses and prove your map lookups is the cause unordered_map is best default choice. – NoSenseEtAl Jun 19 '15 at 15:31
  • @EvgenyPanasyuk, yes, but swapping is still shuffling, even if only for a single element at a time – Jonathan Wakely Jun 20 '15 at 12:04
  • @NoSenseEtAl, that's just wrong. For small sizes std::vector outperforms other containers, hands down. It's not just cache locality, but not doing pointer chasing, and reusing memory rather than allocating/deallocating nodes on every insert/erase operation. Try it: http://ideone.com/Ld0cQF That does far more lookups than insertions/erasures (and unordered_set is meant to be fast at lookups, right?) and it always erases at the start of the vector, which is the worst case for vector, and yet vector still wins by a large margin. – Jonathan Wakely Jun 20 '15 at 12:51
  • your code for counting(in each loop iteration look for rng() number and increment counter if found), not erase and insert :http://coliru.stacked-crooked.com/a/e9f685673499b22c I will leave as an exercise what happens at size 20 when you replace u.count with if (std::find(u.begin(), u.end(), x) !=u.end()) (In my measurements on some compiler versions it is faster for small num elements on some it is not) – NoSenseEtAl Jun 22 '15 at 18:31
  • @NoSenseEtAl, the OP originally said "which get updated a lot and queried not that much" so testing the performance of a program that does no updates isn't very useful. And why the #### would you use `std::find` on an `unordered_set`? – Jonathan Wakely Jun 22 '15 at 21:01
  • wrt #### if you took my advice and benchmark instead of writing #### you would get better idea ;) . IDK the reason why but for some combination of compiler/sizes it is actually faster. My guess is for small number of elements you are not paying for hashing so it is faster, please see: http://coliru.stacked-crooked.com/a/50a3eb534b30ae84 comment uncomment lines 46 47 – NoSenseEtAl Jun 22 '15 at 22:41
1

There's std::unordered_set and std::unordered_map but as far as I know they are not implemented using vectors.

A possible option is to write your own hash vector and hash the key using std::hash<Key> and then index the resulting number modulo the length of the vector, but then you'll have to figure out a way to handle collisions and all the resulting problems manually. Not sure I recommended that.

An alternative would be to pass a custom allocator to std::unordered_set and std::unordered_map which perform the allocation on a vector (for example by owning an internal vector), as suggested by @BeyelerStudios.

Community
  • 1
  • 1
Shoe
  • 74,840
  • 36
  • 166
  • 272
  • you could give your unordered_map an allocator based on a vector – BeyelerStudios Jun 19 '15 at 12:27
  • @BeyelerStudios, such an allocator wouldn't work well unless it has a fixed maximum size which is pre-allocated, in which case you might as well just use an array. If it doesn't have a fixed maximum size then when it grows it would need to reallocate and move all the elements, which violates the reference-stability guarantees of `unordered_map`. Inserting a new element must not invalidate references to existing elements. – Jonathan Wakely Jun 19 '15 at 12:34
  • @JonathanWakely yes, it sounded like that's the case (the fixed maximum size) - there's a fixed maximum size (that you can compute) for almost every problem you encounter – BeyelerStudios Jun 19 '15 at 12:36
  • @BeyelerStudios but if you have a fixed maximum size why use a vector rather than just `unique_ptr` which is simpler and has less overhead? – Jonathan Wakely Jun 19 '15 at 12:37
  • @JonathanWakely this is getting OT: what if the input/API already requires a vector? – BeyelerStudios Jun 19 '15 at 12:38
  • @JonathanWakely Do you have benchmarks showing the "less overhead" part? (Honestly curious). And how is `std::unique_ptr` simpler than a class specifically designed to be used as an array? – Shoe Jun 19 '15 at 12:38
  • @BeyelerStudios, then use a vector directly instead of hiding it inside an allocator and then overlaying an `unordered_map` on top, which would be _really_ complicated and hard to get right. Jeffrey, vector tracks its size and capacity separately and has dozens of member functions, which are unnecessary if the size and capacity are fixed and always equal to each other. If the size is fixed you don't need any of vector's resizing/erasing/inserting etc. functionality. – Jonathan Wakely Jun 19 '15 at 12:40
  • There is also `std::make_heap` over vectors. – v.oddou Jun 22 '16 at 09:16
1

Evgeny Panasyuk is correct, I believe what you want is an Open Address Hash Map.
This fits exactly your requirement, only 1 flat buffer, no allocation of nodes, no pointers to follow, and unsorted.

Otherwise you also have flat_map/AssocVectorbut they are sorted, unlike your requirement.

For OAHM, I have an implementation of a STL-like generic one here:
https://sourceforge.net/projects/cgenericopenaddresshashmap/

Also you might want to take a look the benchmark page of flat_map:
boost::flat_map and its performance compared to map and unordered_map
The OAHM is performing very close to the flat_map in all tests, except iteration.

Community
  • 1
  • 1
v.oddou
  • 6,476
  • 3
  • 32
  • 63
1

Please look at the sfl library that I have recently updated to GitHub: https://github.com/slavenf/sfl-library

It is C++11 header-only library that offers flat ordered and unordered containers that store elements contiguously in memory. All containers meet requirements of Container, AllocatorAwareContainer and ContiguousContainer. Library is licensed under zlib license.

slavenf
  • 56
  • 5