2

After reading questions such as Why siftDown is better than siftUp in heapify? I get the impression

  1. siftDown() is better than siftUp()
  2. siftDown() can always replace siftUp() and vice-versa

Why is it lots of heap structure implementations have siftUp() that is called on insert()? Wikipedia's article has it.

Community
  • 1
  • 1
Celeritas
  • 14,489
  • 36
  • 113
  • 194

2 Answers2

13

The BuildHeap method that sifts elements down is the fastest known way to take an existing array and turn it into a heap. It's faster than inserting one item at a time, and it's faster than sifting elements up from the bottom.

However, once a heap is built, and you're doing a series of inserts and removes on the data structure, it's faster to insert items at the bottom and sift them up than it is to insert at the top and sift down.

Remember that in a heap of n items, n/2 items are at the leaf level. This means that when you insert an item (by adding it as the next leaf), there's a 50% chance that it won't have to sift up: it's already in the proper place. There's a 25% chance that it belongs on the next level up. As you move up in the heap, the probability of the item sifting up to that level decreases by 50%.

Now, you could write you heap code to do the insert at the top all the time, but the probabilities work against you. There's still a 50% chance that the item you inserted will end up at the leaf level. Except if you insert at the top it's going to take you log(n) swaps to get it there.

So if you insert at the bottom and sift up:

50% of the time you make 0 swaps
25% of the time you make 1 swap
12.5% of the time you make 2 swaps
...

If you insert at the top and sift down:

50% of the time you make log(n) swaps
25% of the time you make log(n)-1 swaps
12.5% of the time you make log(n)-2 swaps
...

Come to think of it, it's worse than that. Because if you're inserting an item and it ends up landing somewhere in the middle, you then have to take the item it displaced and sift it down. And that's going to end up moving things down all through the heap. In the end, inserting at the top always costs you log(n) swaps, because you eventually have to place something at the (n+1)th position in the array (i.e. you have to add an item).

It should be clear that you don't want to be inserting at the top, although you have to do something quite similar when you remove the root.

When you remove the root, you take the last item in the heap, put it in the root position, and sift it down. Considering that you got it from the leaf level, and considering that the leaf level contains half of the items, there's a pretty good chance (a little better than 50%) that it's going to end up back at the leaf level. Note that this won't always cause log(n) swaps, because you're not inserting an item; you're just re-adjusting the heap.

This is why, by the way, in a well-written binary heap implementation, removing the root element is more expensive than inserting a new item.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
1

[Edited this answer so people don't have to read through the comments]

The thing that seems to confuse you is the approach to build a tree from scratch using a set number of inputs.

There is this algorithm that creates a random tree first, and then works from bottom to top and sifts down anything it can find. That algorithm is faster than inserting every element after each other and sifting it up.

However, the insert method itself only inserts one element, not a whole list of them. So there is a difference between building the tree and inserting one element.

When you insert one element and you use the same algorithm, you do a lot of pointless comparisons, because you already know most of the tree is already working. In each iteration (from bottom to top) there's always only one subtree that might change (the one with your new input). So that's the only one that really needs to be looked at.

And since the rest of the tree is not randomized, it also means that after pushing your new input up, you don't need to sift down. Everything below it will already be in order.

So in the end, even if you do use the algorithm for inserting one single element, what you're doing is just a simple siftUp() operation with lots of needless additional comparisons.

Mark
  • 1,498
  • 1
  • 9
  • 13
  • `siftDown(i)` doesn't start at the root, it starts at position `i`. – Celeritas Aug 23 '16 at 08:25
  • Not if you try to insert something and reheapify the tree. If you put the element as a new leaf and just siftDown on that it would be completely pointless and not help you at all. – Mark Aug 23 '16 at 08:26
  • You would have to iterate from the lowest level - 1 to the top of the heap. https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap I guess that iteration process isn't always necessary and that's why `siftUp` can be better. – Celeritas Aug 23 '16 at 08:33
  • If you iterate from the lowest level to the top of the heap, you'd basically be recreating a slightly more needlessly complicated version of what is commonly known as siftUp() :D – Mark Aug 23 '16 at 08:36
  • Then why is that the way heaps are created in the first place given a binary tree of arbitrary values (see links)? – Celeritas Aug 23 '16 at 08:42
  • Because the link is talking about completely creating a tree from a list. It's not talking about inserting one element, but a whole lot of them. If you insert all elements at once, this approach would be faster. However, if you want an insert method that actually only inserts one node at a time, it does not make sense to use something other than siftUp. – Mark Aug 23 '16 at 09:11
  • That is my question. What exactly makes it different? – Celeritas Aug 23 '16 at 09:31
  • The difference is that with a lot of elements, you would have to call your insert method a lot of times. Each time, the new element would be sifted up through the entire tree (in the worst case at least). If you just insert them all at random and then use the bottom up approach of the link, that speeds up the process. BUT: Inserting only one element in a preexisting tree is a completely other story. You don't need to recheck the entire tree but only the path the new element will take. Which is done by siftUp. (btw I updated my original answer to explain a bit more) – Mark Aug 23 '16 at 09:55