std::unordered_map - how to "track" max/min key at any time

Question

I have std::unordered_map<int, int>. I do not want to use other structures like tree or anything else cause latency requirements. But at any time I need to know current max key and min key. How can I do that? Distribution is NOT uniform, instead max and min are frequently removed and inserted. So I need something smarter than "just scan entire map for a new max/min when current max/min is removed".

I do not want to use any other structures. I want to use std::unordered_map!

upd according to the answer created such structure:

struct OrderBookItem {
    int64_t price;
    int32_t lots;
};

typedef multi_index_container
    <OrderBookItem, indexed_by<

    hashed_unique<
    BOOST_MULTI_INDEX_MEMBER(OrderBookItem,int64_t,price)
    >,

    ordered_unique<
    BOOST_MULTI_INDEX_MEMBER(OrderBookItem,int64_t,price),
    std::greater<int64_t>
    >

    >> OrderBookContainer;

I don't think you can track the min and the max after deletions for an unordered container. To do that you inherently require keeping an ordering on the elements so you know which one comes before the min/max you just removed. — user541686, Jan 22 '14 at 08:58
As Mehrdad says, you ask the impossible. There are some compromises you could flirt with if memory usage is a huge issue - such as keeping fixed-size top-N and bottom-N buffers as you insert and erase and only recreating them when you've lost track of the min or max - depending on access patterns that might be after 10N or more operations. For average overheads though, a parallel balanced binary tree (i.e. `std::map`) is a good way to go. Note that your unordered_map lookup will still be just as fast... it's just your insert and erase will slow down to binary tree performance levels. — Tony Delroy, Jan 22 '14 at 09:04
@javapowered But it's fast precisely because it doesn't keep an ordering of the elements... — Tristan Brindle, Jan 22 '14 at 09:16
If you need an order, you need an order! The only reason to use an unordered_map in this case is if you are either only adding data, or are willing to suffer the hit of searching the map for the new highest/lowest if you removed one of them. — thecoshman, Jan 22 '14 at 09:31
How many elements are we talking about? Have you compared speed using a `std::map` or a `boost::container::flat_map` (http://www.boost.org/doc/libs/1_55_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx)? How frequent are inserts compared to removals? If inserts are much more frequent than inserts then increasing the min/max on insert, but invalidating it on removal could perhaps be an option. — dalle, Jan 22 '14 at 09:51
possible duplicate of [unordered\_map with "max and min key" tracking?](http://stackoverflow.com/questions/21259269/unordered-map-with-max-and-min-key-tracking) — Sneftel, Jan 23 '14 at 08:37
@Sneftel yes I created new question because this question was about c++ while previous question was about algorithm in general. I likely better modify previous question somehow. — Oleg Vazhnev, Jan 23 '14 at 08:45

TemplateRex · Accepted Answer · 2014-01-22T12:25:54.533

7

Your exact requirements cannot be met: std::unordered_map is fast (i.e. O(1) insertion/erasure/lookup) because it doesn't order its elements. This means that finding the minimum/maximum takes O(N).

The price of ordering paid by std::map that insertion/erasure/lookup (including finding the minimum/maximum) all become O(log N).

If you don't mind using Boost, you could take a look at the Boost.MultiIndex library. This allows you to access your data by multiple interfaces simultaneously. It is an extremely versatile and high-performance library that can also be used for MRU cache data structures (which also combine fast access and tracking of an ordering).

Your primary std::unordered_map interface is provided by the hashed_unique index, where you use an identity function object (provided by Boost.MultiIndex). Your secondary interface would mimic a std::set. This allows finding the minimum/maximum value in O(1) time as *begin() and *rbegin().

You can do that by adding a ordered_unique index with a user-defined MyOrder function object to compute whatever criterion you like to use to order them. Small code example (omitting the boost:: namespace everywhere)

using MyContainer = multi_index_container<
    Item,
    indexed_by<
        hashed_unique<identity<Item>>, // O(1) erasure/lookup, O(log N) insertion
        ordered_unique<MyOrder<Item>>  // track minimum/maximum in O(log N)
    >
>;

Behind the scenes, Boost.MultiIndex implements this roughly as a std::set<Value> with a std::unordered_map<Key, set<Value>::iterator>. This means both lookups and erasures are O(1). The erasure can be O(1) because the unordered_map::erase(value) returns an iterator into the set and std::set<Value>::erase(iterator) is O(1).

However, insertions remain O(log N), because you cannot avoid finding the rank of a newly inserted value within an ordered sequence in less time.

This improvement compared to std::map for lookup/erasure costs only a few pointers per element space overhead.

edited Jan 22 '14 at 12:25

answered Jan 22 '14 at 09:21

TemplateRex

69,038
19
164
304

i think A is wrong, since he will still have logarithmic complexity because of the min max... so it is same if not worse than using just a map, because he says insertions are frequent(if it was mostly lookups it would be better) – NoSenseEtAl Jan 22 '14 at 09:44
1

so you are saying maintaining ordered index when inserting is O(1) ? – NoSenseEtAl Jan 22 '14 at 09:48
I dont see it in documents, basically I dont understand how *new* iterator from unordered map helps you inserting in a *different* map. IDK if that is even legal behavior, to give one maps iterator as hint for another map... – NoSenseEtAl Jan 22 '14 at 10:22
1

@NoSenseEtAl thanks for pointing out my mistake! I corrected my answer to reflect the `O(log N)` insertion. Note that erasure is still `O(1)` due to the iterator indirection. – TemplateRex Jan 22 '14 at 10:58
why this is better than just "std::map"? – Oleg Vazhnev Jan 22 '14 at 12:07
@javapowered because `std::map` has `O(log N)` lookup and erasure for all elements. The Boost.MultiIndex has `O(1)` lookup and erasure for all elements. – TemplateRex Jan 22 '14 at 12:18
sorry for simple question. assuming that i need `unordered_map` what is `Item`? – Oleg Vazhnev Jan 22 '14 at 13:42
@javapowered `Item` would be the `value_type` of `unordered_map`, i.e. `std::pair` – TemplateRex Jan 22 '14 at 14:47
how to tell that hashed_unique should use first `int` of pair for hashing? – Oleg Vazhnev Jan 22 '14 at 14:54
@javapowered thatis done using [key extraction](http://www.boost.org/doc/libs/1_55_0b1/libs/multi_index/doc/reference/key_extraction.html). Using Boost.MultiIndex requires a bit of a learning curve, but the docs are excellent. – TemplateRex Jan 22 '14 at 15:27
thanks, marking your answer as correct but likely I will spent couple days learning multiindex. – Oleg Vazhnev Jan 22 '14 at 17:13
1

@javapowered or just test with map and see how much it is slower from unordered_map before that – NoSenseEtAl Jan 22 '14 at 17:59
what is slower than what? – Oleg Vazhnev Jan 23 '14 at 02:47
posted my structure to the question. I think i should not use "identity" function instead i would prefer to hash by price – Oleg Vazhnev Jan 23 '14 at 08:12
probably you can help me review my implementation too! thank in advance :) http://codereview.stackexchange.com/questions/39870/boost-multi-index-based-orderbook – Oleg Vazhnev Jan 23 '14 at 12:07

poljpocket · Answer 2 · 2014-01-22T09:19:22.610

0

You can write a wrapper. With that, you get every insertion/deletion and can retain min / max values.

#pragma once

#include <unordered_map>

using std::unordered_map;

class my_unordered_map
{
public:
    void insert(int key, int value)
    {
        _map[key] = value;
        if (_max < value)
            _max = value;
        if (_min> value)
            _min = value;
    }
    int get(int key)
    {
        auto it = _map.find(key);
        if (it != _map.end())
            return it->second;
        return 0; // or some error handling here.
    }
    int getMin()
    {
        return _min;
    }
    int getMax()
    {
        return _max;
    }
    // more functionality here.
private:
    unordered_map<int, int> _map;
    int _min = 0;
    int _max = 0;
};

edited Jan 22 '14 at 09:19

answered Jan 22 '14 at 09:01

poljpocket

318
1
10

2

@poljpocket A wrapper was my initial thought too, but what happens when you remove the max value from the map? How do you find the second-highest value? – Tristan Brindle Jan 22 '14 at 09:15
2

This example code conveniently avoids the one operation that is problematic: deletion. – R. Martinho Fernandes Jan 22 '14 at 09:32
I know, I'll try to figure out a solution tonight. – poljpocket Jan 22 '14 at 14:28
@poljpocket is it still day out there..? :P – Adorn Apr 08 '15 at 21:52

std::unordered_map - how to "track" max/min key at any time

2 Answers2

Linked