2

Is there a binary heap implementation where I can pop other elements than the root in log n time?

I use heapq - but heap.index( wKeys ) in

heap.pop( heap.index( wKeys ) )

is very slow. I need a binary heap for my problem - where I sometime use

heapq.heappop(heap)

but also need to pop other elements than at the top from the heap. So a binary heap like heapq imlementation should do that but I havnt found a binary search method. I also looked at treap (http://stromberg.dnsalias.org/~strombrg/treap/) and cannot find a method like this here either.

Endre Moen
  • 695
  • 2
  • 9
  • 19
  • 1
    The heapq documentation (https://docs.python.org/3.5/library/heapq.html) doesn't seem to mention the index method at all. How many keys are you passing in `wKeys`? In general, I wouldn't expect finding the index of a specific member would be fast - though my comp sci is pretty rusty. Ref https://docs.python.org/3.5/library/heapq.html#priority-queue-implementation-notes it seems that "how do you find [an item] and remove it from the queue?" is an open problem, though it suggests a solution there. I think in general what you want to do isn't a 'built-in' property of a binary heap. – Tom Dalton Mar 05 '18 at 16:15
  • but if the heap is organized as a binary tree - shouldnt it be possible to search for any item in log n time? At the bottom of this page: http://stromberg.dnsalias.org/~strombrg/treap/ it says that insert, remove and lookup is done in log n time. While the treap implementation doesnt provide lookup of arbitrary item. – Endre Moen Mar 05 '18 at 16:55
  • 1
    A binary tree just means that each node has at most 2 children. A binary tree in itself has no innate 'ordering' on the nodes. It just so happens that a common use of binary trees is an ordered binary tree that allows efficient binary search. It's possible that `treap` has some additional implementation (such as the 'addon' dictionary) that makes lookup efficient, but that will come at the code of additional operations on push/pop and additional storage cost. – Tom Dalton Mar 05 '18 at 16:59
  • 1
    @EndreMoen: The binary heap is organized as a binary tree, but not a binary search tree. Finding an item in a binary heap requires a sequential search. – Jim Mischel Mar 05 '18 at 16:59
  • 2
    Removing an arbitrary element from a binary heap is an O(log n) operation, as described in https://stackoverflow.com/a/8706363/56778. *Finding* an arbitrary element is an O(n) operation, unless you combine the heap with a lookup mechanism like a hash map or dictionary that keeps track of each item's location in the heap. – Jim Mischel Mar 05 '18 at 17:06
  • @JimMischel - yes that is what I need - a lookup mechanism to find the index of an entry - given the entry. – Endre Moen Mar 05 '18 at 17:21
  • 1
    You need to modify the heap code so that it includes a dictionary that has the item key as the key, and the item's index in the heap as the value. In the code that moves items, you have to update the item's position in the dictionary every time you move the item in the heap. There are probably examples of doing that in python. If not, it's not a difficult modification. – Jim Mischel Mar 05 '18 at 17:27
  • 1
    yu could look at : http://www.grantjenks.com/docs/sortedcontainers/index.html . I really don't get the point of not including sorted containers in python. can someone shed some light? – Shihab Shahriar Khan Mar 06 '18 at 15:55
  • @JimMischel - i have started on an implementation based on heapq (https://github.com/emoen/heapq_with_index/blob/master/heapq_27_with_index.py) and I try to add a new method `pop_arbitrary(heap, item, heapIndex)` - where `heap` is an array, and `heapIndex` is a dictionary of Item: index, but if the heap contains 5 elements and you delete item at position 1 - then in `pop_arbitrary()` - you have to loop through 2, 3, and 4 in the dictionary to decrement the index of these items. How can this be more efficient? – Endre Moen Mar 06 '18 at 21:15
  • 1
    The idea is that every time you move an item in the heap's backing array, you have to update that item's entry in the dictionary. For example in the `_siftup` method, there's this line: `heap[pos] = heap[childpos]`. You need to lookup in the dictionary the item that's at `childpos`, and change its value in the dictionary to be `pos`. You have to do that throughout the heapq code: every time an item moves, you have to change its value in the dictionary. – Jim Mischel Mar 07 '18 at 20:36

2 Answers2

1

Ive modified the implementation of heapq by adding a parameter to heappop(), and heappush() - which is heapIndex. That takes a dictionary of {item: index} and updates the heapIndex as it pops or pushes into heap.

Ive also added a new method heappop_arbitrary() which deletes an arbitrary element and updates the heapIndex

The code is available here: https://github.com/emoen/heapq_with_index

Ive renamed the methods heappop(),heappush() to heappop2(), heappush2() to avoid confusion with the original methods.

I havent implemented this for any of the other functions available in heapq.

Edit: please star if you can use the repo :)

Endre Moen
  • 695
  • 2
  • 9
  • 19
  • added changeValue() – Endre Moen Mar 09 '18 at 18:38
  • heappop_arbitrary() and changeValue() are still quite slow. – Endre Moen Mar 09 '18 at 21:49
  • 1
    `heappop_arbitrary()` and `changeValue()` shouldn't be slow. They should be faster than heappop2(). Also, you should be able to re-code your `else:` (the last one) in `heappop_arbitrary' to remove the last item from the heap and then call `changeValue()`. – Jim Mischel Mar 15 '18 at 17:48
0
class RemoveHeap:
    def __init__(self):
        self.h = []
        self.track = collections.defaultdict(collections.deque)
        self.counter = itertools.count()

    def insert_item(self, val):
        count = next(self.counter)
        item = [val, count, 'active']
        self.track[val].append(item)
        heapq.heappush(self.h, item)

    def delete_item(self, val):
        if val in self.track:
            items = self.track[val]
            for item in items:
                if item[2] == 'active':
                    item[2] = 'deleted'
                    break

    def pop_item(self):
        while len(self.h) > 0:
            item = heapq.heappop(self.h)
            item_track = self.track[item[0]]
            item_track.popleft()
            if len(item_track) == 0:
                del self.track[item[0]]
            else:
                self.track[item[0]] = item_track
            if item[2] == 'active':
                return item[0]

    def peek_item(self):
        item = self.h[0]
        if item[2] == 'deleted':
            x = self.pop_item()
            self.insert_item(x)
            return x
        return item[0]
Isan Sahoo
  • 384
  • 2
  • 10