Bottom up heap analysis

Question

I'm trying to do time complexity analysis on the bottom up heap analysis and I'm stuck. I've done the mathematical evaluation that shows it is O(n) and i completely understand why. The part I'm stuck understanding is how in the "code" it achieves this. I know the outer for executes floor(n/2) times, and I believe the while executes log times, but I don't know how to get from floor(n/2)log to O(n).

Pseudo code: Time analysis:

for i = n/2-1; i <=0; i--         n/2+1
  k=i                             n/2
  while(2*k-1 <= n)               n/2(????)+1  <-- this is where I'm stuck. Should run log n times?
    j = k*2-1                     ...
    if(j<n && H[j] < H[j+1])      ...
      j++                         ...
    if(H[k] < h[j])               ...
      break                       ...
    swap(H[k],H[j])               ...
    k=j                           ...

So I can see that the while probably runs log n times, but I can't see how to get from there (n/2)log n to O(n). I'm only looking for worst case since I know best case is n/2 + 1 since it breaks when the subtree is a heap. Any help or direction to reading material is welcome.

This analysis is fairly subtle. Here's an earlier question that includes the math behind showing that it works out to O(n): https://stackoverflow.com/questions/6299859/how-can-stdmake-heap-be-implemented-while-making-at-most-3n-comparisons — templatetypedef, Jun 19 '20 at 04:25
@templatetypedef thank you for the additional resource. It does an excellent job explaining the math behind the algorithm. However, that part I was already able to confirm, and it too shows that the while should execute log n times, but what still isn't answered is how n/2(log n) + 1 reduces to O(n). — user13772678, Jun 19 '20 at 14:35
So I suppose what's really hanging me up is that in this type of line by line time analysis I don't know how to leverage in the summation that the while runs in actual 2n time. That then does result in n/2(2n)+1 -> O(n) time. — user13772678, Jun 19 '20 at 19:01
After messing with this a hundred different ways and beating my head against math I've long forgotten, I've finally decided this essentially does constant work. Albeit incrementing constant work based on the "layers" of the tree, but since n/2 = n(1/2) and this "constant" are both essentially coefficients they fall away when we convert to O. — user13772678, Jun 21 '20 at 22:00

score 0 · Accepted Answer · answered Jun 21 '20 at 23:40

The best advice I have to offer about working out the big-O cost of different loops is this one:

"When in doubt, work inside out!"

In other words, rather than starting with the outermost loop and working inward, start with the innermost loop and work outward.

In this case, we have this code:

for i = n/2-1; i >= 0; i--
  k=i                           
  while (2*k-1 <= n)
    j = k*2-1
    if(j<n && H[j] < H[j+1])
      j++
    if(H[k] < h[j])
      break
    swap(H[k],H[j])
    k=j

Since we're working inside out, let's focus first on this loop: Let's start by analyzing the innermost loop:

  while (2*k-1 <= n)
    j = k*2-1
    if(j<n && H[j] < H[j+1])
      j++
    if(H[k] < h[j])
      break
    swap(H[k],H[j])
    k=j

I'm going to assume this is a worst-case analysis and that we never trigger the inner break statement. In that case, this means that the loop progresses by having k move to either 2k - 1 or 2k after each step of the loop. This means that k is roughly doubling with each iteration of the loop. The loop ends when k exceeds n, so the number of iterations of the loop is equal to the number of times we have to double k before k exceeds n. That works out to O(log(n / k)) total loop iterations. Note that this isn't a constant; as k gets smaller, we end up doing more and more work per iteration.

We can replace the inner loop with the simpler "do O(log(n / k)) work" to get this:

for i = n/2-1; i >= 0; i--
  k=i                           
  do O(log (n / k)) work;

And, since k = i, we can rewrite this as

for i = n/2-1; i >= 0; i--
  do O(log (n / i)) work;

Now, how much total work is being done here? Adding up the work done per iteration across all iterations, we get that the work done is

log (n / (n/2)) + log (n / (n/2 - 1)) + log (n / (n/2 - 2)) + ... + log(n / 2) + log(n / 1).

Now, "all" we have to do is simplify this sum. :-)

Using properties of logarithms, we can rewrite this as

(log n - log (n/2)) + (log n - log(n/2 - 1)) + (log n - log(n/2 - 2)) + ... + (log n - log 1)

= (log n + log n + ... + log n) - (log(n/2) + (log(n/2 - 1) + ... + log 1)

= (n/2)(log n) - log((n/2)(n/2 - 1)(n/2 - 2) ... 1)

= (n/2)(log n) - log((n/2)!)

Now, we can use Stirling's approximation to rewrite

log((n/2)!) = (n/2)log(n/2) - n log e + O(log n)

And, therefore, to get this:

(n/2)(log n) - log((n/2)!)

= (n/2)(log n) - (n/2)log(n/2) + n log e - O(log n)

= (n/2)(log (2n / 2)) - (n/2) log (n/2) + O(n)

= (n/2)(log 2 + log(n/2)) - (n/2) log (n/2) + O(n)

= (n/2)(1 + log(n/2)) - (n/2) log (n/2) + O(n)

= n/2 + O(n)

= O(n).

So this whole sum works out to O(n).

As you can see, this is a decidedly nontrivial big-O to calculate! Indeed, it's a lot trickier than just counting up the work done per iteration and multiplying by the number of iterations, because the way in which the work per iteration changes across iterations makes that a lot harder to do. Rather, instead we have to do a more nuanced analysis of how much work is done by each loop, then convert things into a summation and pull out some nontrivial (though not completely unexpected) tricks (Stirling's approximation and properties of logarithms) to get everything to work out as expected.

I would categorize this particular set of loops as a fairly tricky one to work through and not particularly representative of what you'd "normally" see when doing a loop analysis. But hopefully the techniques here give you a sense of how to work through trickier loop analyses and a glimpse of some of the beautiful math that goes into them.

Hope this helps!

Thank you very much for this detailed walk through of where the summation occurs and how to break down an algorithm into components for analysis. From all the research I've done, and your excellent help, I think it's become clear that the line by line assigning of discrete formulas is the issue, and really isn't a valid method of analysis in this situation. You're forced to extrapolate the full summation which requires a much more complex approach (as you showed). Unfortunately that is the exact task as given. — user13772678, Jun 22 '20 at 06:02

Bottom up heap analysis

1 Answers1