-1

My question is really simple - having unordered_map<int, int> I need to track maximum key with minimum overhead. Not at any moment, but at certain "checkpoints". More details below.

In my trading application I would like to use unordered_map to store OrderBook items. It should map int to int where key is price and value is volume. (actually price is decimal but i'm using int64 internally)

I'm going to use two unordered_map - one for bid orders and one for offer orders.

I need to know Bid - maximum key of one map and Offer minimum key of another map. Normally I have "several" updates in one pack - and I need to recalculate Bid and Offer only when all updates are processed. For example assume I have such orderbook

100 1
102 1
104 1

Bid (max of key) is 104.

Now I have two updates:

remove key 104
add 103 1
end of transaction

Executing should be as follows:

1:

100 1
102 1

Bid undefined (transaction in progress)

2:

100 1
102 1
103 1

Bid = 103

How can I do that with minimum latency ? Latency is primary requirement, if ignore latency I can do a lot of things:

  • trivial: just scan all keys and find best bid/ask if previous one was deleted
  • i even can use sorted map

So it seems i need some simple algorithm to track max / min of some set without recalculating it every time?

Oleg Vazhnev
  • 23,239
  • 54
  • 171
  • 305
  • Without thinking this through, how about simply having a second variable that always holds the max and min? Then, before inserting in your unordered_map<>, first update the min/max variables. Now you always know your max/min keys per map. -- edit This may give difficulties when removing an item that is the current max, Hmmm. – bastijn Jan 21 '14 at 13:21
  • 1
    @bastijn right this is the "hardest" part – Oleg Vazhnev Jan 21 '14 at 13:46
  • Can you elaborate on what exactly you mean by latency? Do you perhaps mean [time complexity](http://en.wikipedia.org/wiki/Time_complexity)? – Bernhard Barker Jan 21 '14 at 14:08
  • @Dukeling faster is better. – Oleg Vazhnev Jan 21 '14 at 14:14
  • cross-linking this other question: http://stackoverflow.com/a/21278762/819272 – TemplateRex Jan 22 '14 at 09:41

1 Answers1

2

You could take an implementation of an AVL-Tree in your language of choice, and extend the nodes by the information what the minimum key in the left tree and the maximum key in the right tree is (i.e. by introducing minkey and maxkey fields).

The maintenance of this should be fairly easy. For a new node, simply set minkey = maxkey = key. On insert, whenever you make the decision to follow the left or the right path, update the minkey or maxkey field of the current node accordingly. Similarly on delete and rotations. This way, you'll always have the minimum and maximum available in the top node.

The added beneift would be that lookups could be slightly faster, lookup can be terminated as soon as the searched key is lower than minkey or greater than maxkey.

But to me it is not apparent that you in fact would need that, as to find the minimum/maximum key in a well balanced search tree (such as an AVL Tree) should be O(log n) averaged, hence maybe the whole thing is not even worth the effort.

If you need the fast insert/delete of an unordered_map (i.e. a hash map), you can at least reach amortized constant tiome for min and max, by maintaining the following data structure:

class umapminmax {
   map<int, int> themap;
   int min, max;
}

You must insert and delete only through this class and record the minima/maxima. Whenever the minimum or the maximum key gets deleted, you need to scan all your keys to get the new minimum/maximum. This scanning is O(n). But if it doesn't happen too often, the amortized time of accessing the minimum/maximum will still be constant.

If you know, OTOH, that it is likely that the minimum/maximum value is removed often, then the O(log n) solution with a search tree will probably be the fastest you can get.

Still, from how you describe your application, even the following is imaginable:

  • It happens often, that the maximum is removed. But, frequently a new maximum will get inserted before you ask for the current maximum the next time.

If this is the case, you can apply a bit more logic: Say that the maximum key gets deleted. Now, instead of scanning the map for the new maximum, set a flag (another new field) that indicates that the maximum key is not valid. Then, when by chance another value with a key that is greater or equal than the invalid maximum is inserted, you can just record this key as maximum and reset the flag so that in indicates the maximum is valid. When retrieving the maximum, however, you must consult the flag and perform a scan if it says it is invalid.

You see, you really need a model implementation and do some statistics with real world data.

Ingo
  • 36,037
  • 5
  • 53
  • 100
  • i would like to use c++ library class, may be http://www.cplusplus.com/reference/unordered_map/unordered_map/ "Search, insertion, and removal of elements have average constant-time complexity." but i'm afraid "max/min" search might be O(n) because it is not balanced tree and elements are not sorted at all. – Oleg Vazhnev Jan 21 '14 at 13:52
  • @javapowered From what I read there, they use hash functions. Hence, keys can only found in O(n). I in your place would give map a try, it is a binary search tree and you get iterators to the start and end (i.e. minimum and maximum value) in constant time. – Ingo Jan 21 '14 at 13:59
  • tree will be slower than hash for add/delete/lookup - O(log n) instead of O(1) – Oleg Vazhnev Jan 21 '14 at 14:05
  • Yes, @javapowered. Do you have that many elements so it matters? Let me edit to suggest an alternative with amortized constant time. – Ingo Jan 21 '14 at 14:08
  • I have hundreds of elements – Oleg Vazhnev Jan 21 '14 at 14:15
  • 2
    @javapowered I don't believe that there's any way to get around the O(log n) time complexity. But with merely hundreds of elements, the difference in running time should be minor, to say the least. O(log n) might even outperform expected O(1) because of the constant factors involved. – Bernhard Barker Jan 21 '14 at 14:18
  • hashtable must be faster than tree in this case. tree must be used only if sorting is really a requirement. keeping data sorted is expensive and must be avoided if not absolutely required. – Oleg Vazhnev Jan 21 '14 at 14:31
  • @javapowered Have you tested that, or are you just guessing? – Sneftel Jan 21 '14 at 16:17
  • @Sneftel not tested, it's obvious for me. tree are designed to keep elements sorted and this can't be free. hashtable are designed to keep unsorted elements and they benefit from this. – Oleg Vazhnev Jan 21 '14 at 16:21
  • It sounds like you don't understand what the "asymptotic" in "asymptotic complexity" means. Example: Front-insertion into a vector (O(n)) of a few dozen elements is faster than front-insertion into a linked list (O(1)). Saying that one has a lower big-O than the other is NOT a slam dunk. If you truly care about real-world performance, rather than simply about the *appearance* of efficiency, you *need to test it*. – Sneftel Jan 21 '14 at 16:25
  • @Sneftel i understand. I'm implementing orderbook. I talked with experienced programmers. All they said that hashtable must be used instead of tree. Many folks can't be wrong. – Oleg Vazhnev Jan 21 '14 at 16:44
  • @javapowered What did they say about the max/min problem? – Ingo Jan 21 '14 at 16:48
  • @Ingo this is know how people prefer not to share :) in HFT trading people don't like to share their ideas too much :) also i'm sure this is pure algorithmic question so I was hoping to google it but I can't, so I asked a question here. – Oleg Vazhnev Jan 21 '14 at 16:53
  • @javapowered You were right to do so. As you see, I tried to give some ideas that will work or not depending on the patterns that emerge in your application - for example, how likely is it, that the maximum key is deleted. For this, you need to try, test, dig statistics data. – Ingo Jan 21 '14 at 16:57
  • believe me hashtable must be used. I only need to know how to track max/min of hashtable key with lowest latency (something better than O(n)). i think there must be some trick. I found this trick http://stackoverflow.com/questions/4802038/implement-a-queue-in-which-push-rear-pop-front-and-get-min-are-all-consta for queue. But my problem is more complicated. – Oleg Vazhnev Jan 21 '14 at 17:16