66

Python has heapq module which implements heap data structure and it supports some basic operations (push, pop).

How to remove i-th element from the heap in O(log n)? Is it even possible with heapq or do I have to use another module?

Note, there is an example at the bottom of the documentation: http://docs.python.org/library/heapq.html which suggest a possible approach - this is not what I want. I want the element to remove, not to merely mark as removed.

Ecir Hana
  • 10,864
  • 13
  • 67
  • 117

2 Answers2

90

You can remove the i-th element from a heap quite easily:

h[i] = h[-1]
h.pop()
heapq.heapify(h)

Just replace the element you want to remove with the last element and remove the last element then re-heapify the heap. This is O(n), if you want you can do the same thing in O(log(n)) but you'll need to call a couple of the internal heapify functions, or better as larsmans pointed out just copy the source of _siftup/_siftdown out of heapq.py into your own code:

h[i] = h[-1]
h.pop()
if i < len(h):
    heapq._siftup(h, i)
    heapq._siftdown(h, 0, i)

Note that in each case you can't just do h[i] = h.pop() as that would fail if i references the last element. If you special case removing the last element then you could combine the overwrite and pop.

Note that depending on the typical size of your heap you might find that just calling heapify while theoretically less efficient could be faster than re-using _siftup/_siftdown: a little bit of introspection will reveal that heapify is probably implemented in C but the C implementation of the internal functions aren't exposed. If performance matter to you then consider doing some timing tests on typical data to see which is best. Unless you have really massive heaps big-O may not be the most important factor.

Edit: someone tried to edit this answer to remove the call to _siftdown with a comment that:

_siftdown is not needed. New h[i] is guaranteed to be the smallest of the old h[i]'s children, which is still larger than old h[i]'s parent (new h[i]'s parent). _siftdown will be a no-op. I have to edit since I don't have enough rep to add a comment yet.

What they've missed in this comment is that h[-1] might not be a child of h[i] at all. The new value inserted at h[i] could come from a completely different branch of the heap so it might need to be sifted in either direction.

Also to the comment asking why not just use sort() to restore the heap: calling _siftup and _siftdown are both O(log n) operations, calling heapify is O(n). Calling sort() is an O(n log n) operation. It is quite possible that calling sort will be fast enough but for large heaps it is an unnecessary overhead.

Edited to avoid the issue pointed out by @Seth Bruder. When i references the end element the _siftup() call would fail, but in that case popping an element off the end of the heap doesn't break the heap invariant.

Duncan
  • 92,073
  • 11
  • 122
  • 156
  • 3
    +1, with the side note that it would be cleaner to copy the definition of `_siftup` into the program as recommended by @AlexMartelli, [here](http://stackoverflow.com/questions/1465662/how-can-i-implement-decrease-key-functionality-in-pythons-heapq). – Fred Foo Apr 15 '12 at 15:55
  • Thanks, `_siftup` looks definitely interesting! Btw., why `pop(-1)`, instead of just `pop()`? – Ecir Hana Apr 15 '12 at 19:42
  • @EcirHana just because I can't remember the default off the top of my head. I've tidied it up. – Duncan Apr 15 '12 at 20:36
  • 1
    @Duncan I have a doubt here, I am trying to implement decreaseKey operation on priority queue. In your method, you are assuming that decrease has index(i) to the item to be deleted. If I have just the element not the index, then how can it be done? – Naman Sep 13 '14 at 18:17
  • @Naman If you don't have an index then I think the best you can do is use `heapify(q)` to restore the heap ordering in linear time. – Duncan Sep 13 '14 at 19:25
  • @Duncan I was thinking whether it is possible if I keep a dict alongside my heap which will point to the index in heap? But updating such dict with every insert becomes bottleneck. Is there anyway in which we can overcome it? The heap I am building is quite large and I really need O(log n) time decreasekey operation. – Naman Sep 13 '14 at 22:05
  • @dano I guess in the _siftup() example, there's a mistake. If I removed last item, you are just replacing last item with itself then you pop it out and then when you'll call _siftup(), it will give array out of bound. We can't call sift up in that case. Am I right? – Naman Sep 15 '14 at 23:42
  • 1
    Since you don't know whether the new h[i] will be greater or smaller than its parents or children, you also need to call heapq._siftdown(h, 0, i) before or after calling _siftup – seaotternerd Feb 12 '16 at 01:46
  • Why not use: `h.sort()` instead of siftup/siftdown methods in order to maintain the heap invariant? Seems to be preferred way to do it according to the docs: [https://docs.python.org/3.5/library/heapq.html?highlight=heap#module-heapq](https://docs.python.org/3.5/library/heapq.html?highlight=heap#module-heapq) Also I find it cleaner to just `del h[i]` instead of playing with reference/pop – b1r3k Apr 04 '16 at 11:27
  • @b1r3k Because of performance. `sort` and `del` are both much slower than the above method. – semicolon Oct 12 '16 at 06:22
  • 1
    @Duncan I think the point by @seaotternerd still stands: as it is now, the index argument to `_siftup()` may index the element that was just removed by `pop()`, causing `_siftup()` to throw. – Seth Bruder Aug 06 '17 at 05:34
  • 1
    @SethBruder, good catch. Yes, the `_siftup` would indeed throw, but if you remove the very last element you don't need to do either `_siftup` or `_siftdown`. Updated the answer accordingly. – Duncan Aug 07 '17 at 08:13
  • can you explain why the first approach is O(n), I thought heapify is O(logn), thanks – ascetic652 Nov 19 '18 at 15:36
  • @ascetic652 `heapq.heapify()` is O(n) because it doesn't know which element is out of order so it will scan every element in the heap. Calling `_siftup()` and `_siftdown()` start at the element that may be out of position and don't consider other parts of the heap, so that is O(log n). – Duncan Nov 20 '18 at 10:03
22

(a) Consider why you don't want to lazy delete. It is the right solution in a lot of cases.

(b) A heap is a list. You can delete an element by index, just like any other list, but then you will need to re-heapify it, because it will no longer satisfy the heap invariant.

Marcin
  • 48,559
  • 18
  • 128
  • 201
  • 1
    could you add some reference for (b) ? – Zenon Apr 15 '12 at 14:05
  • 1
    @Zenon Which part of b? You can look at the type of an object in your interpreter, or read the documentation that OP links to; as to needing to re-heapify, this is a consequence of the fact that such an operation leads to a list that violates the heap invariant (also given in that documentation). – Marcin Apr 15 '12 at 14:08
  • (a) - lazy delete is perfectly valid, I just would like to understood the heaps better. (b) I'm interested in at least O(log n), heapify is O(n) – Ecir Hana Apr 15 '12 at 19:38
  • 2
    lazy delete is a genius way to get around O(N) delete cost for heaps. – anthonybell Apr 16 '17 at 09:16
  • 2
    for anyone wondering what a 'lazy delete' is you can find the article below but essentially in this case you mark an element as 'deleted' in a key value store but don't actually remove it from the heap as that would require O(n) time. Then when you are using the heap you can check that key value store if the node you are looking at is marked as deleted. It's used for hash tables but can be used here as well https://en.wikipedia.org/wiki/Lazy_deletion – athammer Apr 01 '21 at 21:23