6

I tried watching http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-4-heaps-and-heap-sort/ to understand heaps and heapsort but did not find this clear.

I do not understand the function of max-heapify. It seems like a recursive function, but then somehow it's said to run in logarithmic time because of the height of the tree.

To me this makes no sense. In the worst case, won't it have to reverse every single node? I don't see how this can be done without it touching every single node, repeatedly.

DoubleBass
  • 1,143
  • 4
  • 12
  • 29

3 Answers3

9

Here's what MAX-HEAPIFY does:

Given a node at index i whose left and right subtrees are max-heaps, MAX-HEAPIFY moves the node at i down the max-heap until it no longer violates the max-heap property (that is, the node is not smaller than its children).

The longest path that a node can take before it is in the proper position is equal to the starting height of the node. Each time the node needs to go down one more level in the tree, the algorithm will choose exactly one branch to take and will never backtrack. If the node being heapified is the root of the max-heap, then the longest path it can take is the height of the tree, or O(log n).

MAX-HEAPIFY moves only one node. If you want to convert an array to a max-heap, you have to ensure that all of the subtrees are max-heaps before moving on to the root. You do this by calling MAX-HEAPIFY on n/2 nodes (leaves always satisfy the max-heap property).

From CLRS:

for i = floor(length(A)/2) downto 1
   do MAX-HEAPIFY(A,i)

Since you call MAX-HEAPIFY O(n) times, building the entire heap is O(n log n).*

* As mentioned in the comments, a tighter upper-bound of O(n) can be shown. See Section 6.3 of the 2nd and 3rd editions of CLRS for the analysis. (My 1st edition is packed away, so I wasn't able to verify the section number.)

beaker
  • 16,331
  • 3
  • 32
  • 49
  • Actually building the entire heap is O(n) if you compute the time complexity more carefully. – Zeno Rogue May 11 '20 at 18:51
  • @ZenoRogue Yes, a tighter upper-bound can be found by taking into consideration the depth at which each node is inserted. You can find the analysis in the Second edition of CLRS in section 6.3, which is reproduced [here](https://web.stanford.edu/class/archive/cs/cs161/cs161.1168/lecture4.pdf) in section 1.3.2. – beaker May 11 '20 at 19:11
  • @ZenoRogue I've added a note to my answer. Please let me know if I've missed anything. – beaker May 11 '20 at 19:30
  • Looks great! (I have no access to section numbers either.) – Zeno Rogue May 12 '20 at 07:44
  • So, `heapify` function heapifies a node; and NOT the whole heap(or array). – Deepam Gupta Nov 24 '21 at 08:22
  • @DeepamGupta Exactly. Following the pseudocode in CLRS, `MAX-HEAPIFY` assumes that the left and right subtrees of the (single) input element `A[i]` are already max-heaps. The input element `A[i]` is then moved to its proper position so that `A[i]` and its subtrees now form a heap. – beaker Nov 24 '21 at 15:27
3

In the worst case, won't it have to reverse every single node?

You don't have to go through every node. The standard max-heapify algorithm is: (taken from Wikipedia)

Max-Heapify (A, i):
    left ← 2*i  // ← means "assignment"
    right ← 2*i + 1
    largest ← i

    if left ≤ heap_length[A] and A[left] > A[largest] then:
        largest ← left
    if right ≤ heap_length[A] and A[right] > A[largest] then:
        largest ← right

    if largest ≠ i then:
        swap A[i] and A[largest]
    Max-Heapify(A, largest)

You can see that on each recursive call you either stop or continue with the subtree left or right. In the latter case you decrease the tree height with 1. Since the heap tree is balanced by definition you would do at most log(N) steps.

sve
  • 4,336
  • 1
  • 19
  • 30
  • Now I am even more confused, because @Aravind said that in CLRS, it is shown as O(N) time – DoubleBass Jan 28 '16 at 02:28
  • I give you random list `[7, 9, 8, 15, 13, 11, 10, 5, 6, 1, 14, 3, 2, 4, 12]`, (first level 1 node, second level 2 nodes, third level 4 nodes, fourth level 8 nodes), you're saying you can convert this to a max heap without touching all elements? – DoubleBass Jan 28 '16 at 02:32
  • @DoubleBass You don't have to touch all of the elements, only the elements in a single path between the leaf where the element is first added up to the root. As it says in CLRS, another way to think of this is `O(h)` where `h` is the height of the tree. – beaker Jan 28 '16 at 03:09
  • When I call max_heapify on my list, the resulting list doesn't seem to be a max heap – DoubleBass Jan 28 '16 at 03:10
  • Was it a max heap before? Because if the rest of the tree isn't a heap then you can't expect heapifying one node to fix it all. (Note that I stated the direction of the path incorrectly above. The value being heapified moves from the root to a leaf.) – beaker Jan 28 '16 at 03:17
  • I don't understand then what max heapify is supposed to do, then. Like I have a random list and I want to turn it into a max heap. – DoubleBass Jan 28 '16 at 03:18
  • @DoubleBass I've added an answer since it was getting too long to fit in a comment. – beaker Jan 28 '16 at 03:41
1

Here's an argument for why it's O(N).

Assume it's a full heap, so every non-leaf node has two children. (It still works even if that's not the case, but it's more annoying.)

Put a coin on each node in the tree. Each time we do a swap, we're going to spend one of those coins. (Note that when elements swap in the heap, the coins don't swap with them.) If we run MAX-HEAPIFY, and there's any coins left over, that means we've done fewer swaps than there are nodes in the tree, and thus MAX-HEAPIFY performs O(N) swaps.

Claim: after MAX-HEAPIFY is done running, a heap will always have at least one path from the root to a leaf with coins on every node of the path.

Proof by induction: For a single-node heap, we don't need to do any swaps, so we don't need to spend any coins. Thus, the one node gets to keep its coin, and we have a full path from root to leaf (of length 1) with coin intact.

Now, assume we have a heap with left and right subheaps, and MAX-HEAPIFY has already run on both. By inductive hypothesis, each has at least one path from root to leaf with coins on it, so we have at least two root-to-leaf paths with coins, one for each child. The farthest the root would ever need to go in order to establish the MAX-HEAP property is to swap all the way to the bottom of the tree. Let's say it swaps down into the left subtree, and it swaps all the way to down to the bottom. For each swap, we need to spend the coin, so we spend it from the node that the root swapped to.

In doing this, we spent all the coins on one of the root-to-leaf paths, but remember we originally had at least two! Therefore, we still have a root-to-leaf path complete with coins after MAX-HEAPIFY runs on the whole heap. Therefore, MAX-HEAPIFY spent fewer coins than there are nodes in the tree. Therefore, the number of swaps is O(N). QED.

hjfreyer
  • 539
  • 3
  • 11