0

My goal is to write a class that works like a unordered_map but keeps the insertion order of the elements while still allowing O(1) lookup by key.

My approach is as follows:

// like unordered_map but keeps the order of the elements for iteration
// implemented as a vector with a unordered_map for constant time lookups
// consider if element removal is needed, since it is O(N) because of the use of a vector of elements
template <typename KEY, typename VAL>
struct ordered_map {
    struct KeyValue {
        KEY key;
        VAL value;
    };

    std::vector<KeyValue> elements; // KeyValue pairs in order of insertion to allow iteration
    std::unordered_map<KEY, int> indexmap; // key -> index map to allow O(1) lookup, is there a way to avoid the redundant key?
    
    //...
}

But I have the problem that in my approach I want to lookup into the indexmap using the keys which are stored 'externally' to it (in the elements vector based on the index value in the map).

std::sort for example allows passing in a comparator, but unordered_sap does not seem to have anything similar.

I could not really find anything online about how to accomplish this, but I might be searching with the wrong terms.

I this approach at all supported by the stl?

Or do I need to resort to storing the Key twice, which I would like to avoid as keys can be heap objects like std::strings for example.

EDIT: unordered_map instead of unordered_set which does not work

Hexcoder
  • 3
  • 3
  • I'm guessing that `indexmap` holds the hashes of the keys since it's using `std::size_t`. So what's the problem with using keys stored externally? You would just hash the key and check if it exists in the `indexmap`? – super Jan 19 '21 at 11:25
  • Perhaps [boost::multi_index](https://www.boost.org/doc/libs/release/libs/multi_index/doc/index.html) can help? [Example](https://stackoverflow.com/a/1098228/7582247) – Ted Lyngmo Jan 19 '21 at 11:25
  • 1
    `KeyValue` is small so having `std::set` and `std::unordered_set` at the same time should not be a problem. On other hand I smell [XY problem](http://xyproblem.info/) so please explain why you have this requirement. – Marek R Jan 19 '21 at 11:27
  • I would probably switch your approach around. Use `unordered_map` to hold the values and a `vector` with pointers to keep track of the order. – super Jan 19 '21 at 11:27
  • If you want the `unordered_set` to actually be an `unordered_map`, it would have been less confusing to write that. Then `SomeKey` can just be a [`std::reference_wrapper`](https://en.cppreference.com/w/cpp/utility/functional/reference_wrapper) - although I agree it's more sensible to keep the KEY in the map and manage an external sequence of iterators. – Useless Jan 19 '21 at 11:44
  • @super No, the idea was to use show how I want a unordered_set that only stores the hash + index into elements, i'll update the question with the unordered_map version that stores the key twice. I guess storing the key+value in the map is a good idea, but I'm not sure how you could get a pointer to any element that won't get invalidated on inserts – Hexcoder Jan 19 '21 at 11:47
  • The unordered_map should store key/values. The vector should store the iterators into the unordered_map. – Sam Varshavchik Jan 19 '21 at 11:48
  • @MarekR I agree it's XY, but I was wondering if there is a workaround to having to store the key in the map, since as far as I can tell there is no reason why the implementation could not simply pull the hash from any other place during access like via a lambda – Hexcoder Jan 19 '21 at 11:54
  • @Hexcoder Keys/values in an `unordered_map` does not reallocate, so you can take pointers to them. Then you can let `unordered_map` deal with the hashes and O(1) lookup and just have a vector of pointers to the values with preserved order on the side. – super Jan 19 '21 at 11:57
  • I just realized that refs to the key/values in the map actually stay valid (since they are stored as individual heap objects since it's a linked list per hash bucket?) So simply having a `pair*` in the vector should work – Hexcoder Jan 19 '21 at 11:58
  • @Hexcoder So you not only need to preserve the order, you also need to be able to lookup the index by using a key? – super Jan 19 '21 at 12:04
  • @super That was initially in my code, but it's not really needed – Hexcoder Jan 19 '21 at 12:15

1 Answers1

0

My goal is to write a class that works like a unordered_map but keeps the insertion order of the elements while still allowing O(1) lookup by key

... and without duplicating the key, in cases it's large and/or expensive.

So, you want two things:

  1. a constant-time associative lookup with the same semantics as std::unordered_map (no duplicate keys, or you would/should have asked for std::unordered_multimap)
  2. a sequential index tracking insertion order

You have chosen to implement the associative lookup using a sequential container, and the sequential index using an associative container. I don't know why, but let's try the more natural alternative:

  1. the associative lookup should just be std::unordered_map<Key, SomeValueType>
  2. the sequential index can just be std::vector<SomeValueType*>

The remaining blank is SomeValueType: it could just be VAL, but then we need to do extra work to fix the sequential index whenever we erase something. We could make it std::pair<VAL, size_t> instead, so it can store the index i of the iterator we'll need to remove on erasure. The down side is that on top of moving all i+1..N vector elements down one, we also need to update the index value for each of those map elements.

If you want to preserve constant-time erasure, you probably need the sequential index to be a std::list<SomeValueType*>, and to keep a std::list::iterator in your map element. Actual linear iteration will be slower on a linked list than a vector, but you get the same complexity when erasing elements.


NB. The sequential index does need to store pointers rather than iterators - I originally forgot the invalidation behaviour for unordered maps. However, if you want access to the key, they can obviously be std::unordered_map<key, SomeValueType>::value_type pointers ... I just wrote out the shorter alternative.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • Iterators can get invalidated on rehashing, but a pointer to the value will not. – super Jan 19 '21 at 12:00
  • Erasure is rarely needed but the approach with a std::list is a good tip, thanks. The reason why I had my approach backwards is probably because I think of the hashmap as an acceleration structure and the vector as the 'real' data structure I actually want. Actually thinking about the fact that the map already used linked lists internally (i think) the better solution would be to simply add pointers to the already existing elements to keep track of the order. – Hexcoder Jan 19 '21 at 12:25