21
def heapify(A):
    for root in xrange(len(A)//2-1, -1, -1):
        rootVal = A[root]
        child = 2*root+1
        while child < len(A):
            if child+1 < len(A) and A[child] > A[child+1]:
                child += 1
            if rootVal <= A[child]:
                break
            A[child], A[(child-1)//2] = A[(child-1)//2], A[child]
            child = child *2 + 1

This is a similar implementation of python heapq.heapify(). It is said in the doc this function runs in O(n). But it looks like for n/2 elements, it does log(n) operations. Why is it O(n)?

typing...
  • 934
  • 2
  • 10
  • 25
  • inside the loop, child = child * 2 + 1 until it gets to len(A) – typing... Aug 07 '18 at 21:33
  • Also see https://stackoverflow.com/questions/49774237/prove-that-binary-heap-build-max-comparsion-is-2n-2/49781979#49781979 – Jim Mischel Aug 08 '18 at 13:03
  • I don't understand why @typing suggested the child = child*2 + 1. You move from the current node (root) to the child once you have finished, but if you go to the child's child you are actually jumping a level of a tree, try to heapify this array [2|10|9|5|6] – LightMan Jul 06 '22 at 09:57

1 Answers1

22

It requires more careful analysis, such as you'll find here. The basic insight is that only the root of the heap actually has depth log2(len(a)). Down at the nodes one above a leaf - where half the nodes live - a leaf is hit on the first inner-loop iteration.

"Exact" derivation

Waving hands some, when the algorithm is looking at a node at the root of a subtree with N elements, there are about N/2 elements in each subtree, and then it takes work proportional to log(N) to merge the root and those sub-heaps into a single heap. So the total time T(N) required is about

T(N) = 2*T(N/2) + O(log(N))

That's an uncommon recurrence. The Akra–Bazzi method can be used to deduce that it's O(N), though.

I think more informative, and certainly more satifsying, is to derive an exact solution from scratch. Toward that end, I'll only talk about complete binary trees: as full as possible on every level. Then there 2**N - 1 elements in total, and all subtrees are also complete binary trees. This sidesteps mounds of pointless details about how to proceed when things aren't exactly balanced.

When we're looking at a subtree with 2**k - 1 elements, its two subtrees have exactly 2**(k-1) - 1 elements each, and there are k levels. For example, for a tree with 7 elements, there's 1 element at the root, 2 elements on the second level, and 4 on the third. After the subtrees are heapified, the root has to moved into place, moving it down 0, 1, or 2 levels. This requires doing comparisons between levels 0 and 1, and possibly also between levels 1 and 2 (if the root needs to move down), but no more that that: the work required is proportional to k-1. In all, then,

T(2**k - 1) = 2 * T(2**(k-1) - 1) + (k - 1)*C

for some constant C bounding the worst case for comparing elements at a pair of adjacent levels.

What about T(1)? That's free! A tree with only 1 element is a already a heap - there's nothing to do.

T(1) = 0

One level above those leaves, trees have 3 elements. It costs (no more than) C to move the smallest (for a min-heap; largest for a max-heap) to the top.

T(3) = C

One level above that trees have 7 elements. It costs T(3) to heapify each of the subtrees, and then no more than 2*C to move the root into place:

T(7) = 2*C + 2*C = 4*C

Continuing in the same way:

T(15) = 2* 4*C + 3*C = 11*C
T(31) = 2*11*C + 4*C = 26*C
T(63) = 2*26*C + 5*C = 57*C
...
T(2**k - 1) = (2**k - k - 1)*C

where the last line is a guess at the general form. You can verify that "it works" for all the specific lines before it, and then it's straightforward to prove it by induction.

So, where N = 2**k - 1,

T(N) = (N - log2(N+1)) * C

which shows that T(N) is bounded above by C*N, so is certainly O(N).

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • I do not understand. This does not explain why the heapify() takes O(log(N)). TH(n) = c, if n=1 worst case when the largest if never root: TH(n) = c + ? * TH( ? ) how to write the recursive expression? – user3742309 Jun 09 '20 at 20:07
  • 1
    Already gave a link to a detailed analysis. It doesn't use a recursive formulation, and there's no need to. And the claim isn't that heapify takes O(log(N)) time, but that it takes O(N) time. Consider opening a different issue if you have a focused question. – Tim Peters Jun 09 '20 at 20:24
  • @user3742309, see edit for a full derivation from scratch. – Tim Peters Jun 11 '20 at 01:24