3

Say you have a lot of (key, value) objects to keep track of, with many insertions as well as deletions.

You need to satisfy 3 requirements:

  1. get the maximum key in constant time at any point
  2. look up the value of any key in logarithmic time.
  3. insertions and deletes take logarithmic time.

Is there a data structure that can do this?

My thoughts:

priority queues can get max in constant time, but i can't lookup values. binary search trees (2-3 trees) can lookup in logarithmic time, but max takes O(lgN) as well. if i try to keep track of the max in a BST, it takes O(lgN) when I have to delete the max and find the second greatest.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Popcorn
  • 5,188
  • 12
  • 54
  • 87
  • 1
    doesn't lookup take linear time in max-heap? – Popcorn Jun 14 '12 at 17:44
  • I think you'll have to prioritize which operation you'd like to optimize, considering all the operations. – user845279 Jun 14 '12 at 17:46
  • When you say "get the maximum", I take it you mean the maximum *value*, not the maximum key? – Steve Jessop Jun 14 '12 at 18:21
  • sorry for the ambiguity, i am looking for the maximum key. i have updated this. – Popcorn Jun 14 '12 at 21:38
  • Are you positive you need O(1) to get the max key? O(log(n)) is a _lot_ faster than a lot of people think it is. If you have enough `ints` to fill 16GB of memory, O(log(n)) will check 32 of them. That's fast. – Mooing Duck Jun 14 '12 at 21:47
  • @Mark X: ah, in that case it's easy (xvatar's answer). Your objection "it takes O(lgN) when I have to delete the max and find the second greatest" is false, because that's part of the cost of the delete operation, not part of the cost of the getmax operation. – Steve Jessop Jun 15 '12 at 09:44

9 Answers9

10

Why we need those fancy data structs? I think a simple Binary Search Tree with tracking the Max node can serve OP's requirment well.

  1. You can track the node with the max key:

    whenever you insert a new node, you compare the key with the previous max key to decide if this is a new max node

    whenever you delete the max node, it takes O(logN) to find the next max node

  2. You certainly have O(logN) lookup time with the nature of BST

  3. BST's update takes O(logN) time

xvatar
  • 3,229
  • 17
  • 20
  • 1
    Just to make sure everyone's clear, this needs to be a **balanced** BST for these time bounds to hold. – templatetypedef Jun 14 '12 at 21:24
  • "Why we need those fancy data structs?" because everyone else thought that "get max" meant the max value, not the max key. Well done :-) – Steve Jessop Jun 15 '12 at 09:50
  • Why isn't a `Binary Search Tree with tracking the Max node` considered a `fancy data struct`? – Inverse Jun 19 '12 at 20:12
  • @Inverse I guess that part depends on the definition of "fancy" :) But I do think BST is a very common and plain data structure, and tracking the max node takes fairly small efforts to both understand and implement. – xvatar Jun 19 '12 at 21:08
5

You can just use two data structures in parallel-

  1. Store the key/value pairs in a hash table or balanced BST to get O(log n) lookups, and
  2. Store all the values in a max heap so that you can look up the max in O(1) time.

This makes insertion or deletion take O(log n) time, since that's the time complexity of inserting or deleting from the max heap.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • thank you, does this take 2N memory? Is there a way to get it down to N memory? – Popcorn Jun 14 '12 at 17:49
  • @MarkX- I don't know of any data structures that use exactly N memory and give efficient lookups. Hash tables usually require at least 2N memory to avoid rampant collisions, and BSTs usually need N memory plus 2N pointers. I seriously doubt that you'll find a solution that works in exactly N space. – templatetypedef Jun 14 '12 at 17:51
  • @MarkX I think time wise it's the same as you just use a BST and track the max node, so why not just use a BST? – xvatar Jun 14 '12 at 17:52
  • "I don't know of any data structures that use exactly N memory and give efficient lookups" - sorted arrays of course, but they're not so good on insert/delete. – Steve Jessop Jun 14 '12 at 18:24
  • A variant of the hashtable + priority queue approach -> The hash Table can have and maxheap can have the data nodes. – Arvind Jun 14 '12 at 19:39
  • You can store the actual key/value pairs once in their own array, and store only pointers or array indices in your priority-q and key-index. If the size of your key+value is significant, this should keep memory overhead low. – comingstorm Jun 14 '12 at 21:26
3

Skip lists have an amortized O(logn) lookup, and they're a linked list so min and max is always O(1). http://en.wikipedia.org/wiki/Skip_list

Hans Z
  • 4,664
  • 2
  • 27
  • 50
  • I think that's expected O(log n), not amortized O(log n). – templatetypedef Jun 14 '12 at 17:54
  • I think it's both... the worst case O(n) happens 1 in every n cases on average, so the amortized analysis looks like O(nlogn) * (n-1)/n + 1, which is O(nlogn) – Hans Z Jun 14 '12 at 17:57
  • My understanding is that the amortized cost refers to the total cost of a series of N operations in the worst case. Here, the worst case could be N^2, since you could do a really bad lookup over and over again. The expected runtime is the average runtime taken over all possible runtimes, which is what you're describing. – templatetypedef Jun 14 '12 at 17:59
  • I thought amortized is the average cost of an operation given n operations (lim n->infinity). If you added n items to a skip list, the overall time complexity will look like O(nlogn). You can't have the worst case every single time. – Hans Z Jun 14 '12 at 18:03
  • Perhaps I'm mistaken here, but I thought it was possible to get the worst-case every time. You could randomly choose to insert a node with height 1 that's smaller than everything else, thus causing your insert to be an append to a singly-linked list, which takes time O(n). Am I mistaken about this? – templatetypedef Jun 14 '12 at 18:07
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/12570/discussion-between-hans-and-templatetypedef) – Hans Z Jun 14 '12 at 18:08
1

I know a hash table has O(1) search time due to the fact that you use keys and you can instantaneously look up that value. As far as the max value, you may be able to constantly keep track of that every time that you insert or delete a value.

squiguy
  • 32,370
  • 6
  • 56
  • 63
  • 1
    How can you track the max efficiently? If you delete the maximum value, you need to somehow be able to find the new max efficiently. If everything is in a hash table, this takes time O(n). – templatetypedef Jun 14 '12 at 17:46
  • This is true. I was talking about keeping track of the max in your add and delete methods/functions or whatever by having a global variable. If you did delete the max, it would be linear time to find the new one since they are randomly scattered, you're correct. – squiguy Jun 14 '12 at 17:49
1

How about a list sorted in descending order?

  1. Max is always first so O(1).
  2. Look-up is O(log n) through binary search.
  3. Insertion/Deletion is O(n) because you'll have to shift n-i items when inserting/deleting from position i.
user845279
  • 2,794
  • 1
  • 20
  • 38
  • The question says, "with many insertions as well as deletions", so although you satisfy the 2 requirements, I think the requirements are flawed, and it will turn out that `O(n)` insert and delete is unacceptable. – Steve Jessop Jun 14 '12 at 18:18
  • how do you binary search the list? – Popcorn Jun 14 '12 at 18:18
  • @MarkX: this answer doesn't mean a linked list, it means an array. Which are referred to as lists in some languages. – Steve Jessop Jun 14 '12 at 18:18
  • oh i see, thank you. sorry i forgot to also mention insertions/deletes should take O(lgN) also – Popcorn Jun 14 '12 at 18:19
1

Since your are using key value pairs a best solution i can suggest you is to use TreeMap in java.

You can simply use the following 4 methods present in the Treemap.

  • get() and put(key,value) methods for insert and retrieve
  • lastKey() for finding max key.
  • remove(key) for deletion.

.or use a following structure as in this page

Final conclusion:

If you have are going to trade off space complexity and keen on running time you need to have 2 data structures.

Use a HashMap or TreeMap which has O(1) for insert,retrieval and remove.

Then as per the second link i provided use a two stack data structure to find the max or min of o(1).

I think this is the best possible solution i can give.

Community
  • 1
  • 1
Sadesh Kumar N
  • 1,962
  • 7
  • 19
  • 26
0

Take a look at RMQ (Range minimum-maximum Query) data structure or segment tree data structure. They both has a O(1) query time, BUT you will have to modify them somehow to store values also..

Here is nice article http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor

Ribtoks
  • 6,634
  • 1
  • 25
  • 37
  • 1
    RMQ is only fast if you do a lot of preprocessing. If you're adding and removing elements all the time, you'd do a huge amount of unnecessary work repeatedly recomputing this preprocessing information. This also doesn't give you an O(log n) way to do lookups. – templatetypedef Jun 14 '12 at 17:53
  • @templatetypedef Lookups of max-min in segment tree or RMQ are O(1) operations. Insertion has a O(log N) complexity. That's why it almost fits the requirements. – Ribtoks Jun 14 '12 at 18:06
  • But they only have those runtimes after doing either O(n) or O(n log n) preprocessing time. If you do inserting or deletions you have to rebuild the segment tree or recompute the information for the RMQ data structure. – templatetypedef Jun 14 '12 at 18:09
  • ... but the OP is asking for insertions and deletions to take O(log n) time. Your answer doesn't meet this criterion. – templatetypedef Jun 14 '12 at 18:53
0

As the first comment says, use a max heap. Use a hashmap to store pointers into the heap. These are used for the constant time lookup and log time delete.

Heaps are very simple to implement. They don't require balancing like BST's. Hashmaps are usually built into your language of choice.

Thomas Ahle
  • 30,774
  • 21
  • 92
  • 114
0

Deletion in a tree data structure is already an O(logN) operation, so looking for the second greatest key is not going to change the complexity of the operation.

Though, you can invalidate elements instead of deleting then, and if you keep back pointers inside your data structure, moving for the greatest to the second greatest could be an O(N) operation.

salva
  • 9,943
  • 4
  • 29
  • 57