How does heapsort's big-O runtime relate to the number of comparisons made?

Question

I have read quite a bit on big-O notation and I have a basic understanding. This is a specific question that I hope will help me understand it better.

If I have and array of 100 integers (no duplicates, and randomly generated) and I use heapsort to sort it, I know that big-O notation for heapsort is n lg n. For n = 100, this works out to 100 × 6.64, which is roughly 664.

While I know this is the upper bound on the number of comparisons and my count can be less than 664, if I am trying to figure out the number of comparisons for a heap sorted array of 100 random numbers, it should always be less than or equal to 664?

I am trying to add counters to my heapsort to get the big-O comparison time and coming up with crazy numbers. I will continue to work it out, but wanted to just verify that I was thinking of the upper bound properly.

Thanks!

score 4 · Answer 1 · edited May 23 '17 at 11:50

4

Big-O notation does not give you an exact upper bound on a function's runtime - instead, it tells you asymptotically how the function's runtime grows. If a function has runtime O(n log n), it means that the function grows at roughly the same rate as the function f(n) = n log n. That means, for example, that the actual runtime could be 23 n log n + 17 n, or it could be 0.05 n log n. Consequently, you can't use the fact that heapsort is O(n log n) to count the number of comparisons made. You'd need a more precise analysis.

It just so happens that you can get a very precise analysis of heapsort, but it requires you to do a more meticulous analysis of the algorithm. You can show, for example, that the number of comparisons required to call make-heap is at most 3n, and that the number of comparisons made during the repeated calls to extract-min is at most 2n log (n + 1) (the binary heap has log (n + 1) layers, and during each of the n extract-max's, at each layer at most two comparisons are made). This gives an overall number of comparisons upper-bounded by 2n log (n + 1) + 3n.

The famous Ω(n log n) sorting barrier can be used to get a matching lower bound. Any comparison-based sorting algorithm, of which heapsort is one, must make at least log n! = n log n - n + O(log n) (this is Stirling's approximation) comparisons on average, and so heapsort is required to make at least n log n - n comparisons in the worst-case. (Note that this is actually n log n, not some constant multiple of n log n. You can read over the proof of the Ω(n log n) barrier for why this is.)

Hope this helps!

edited May 23 '17 at 11:50

Community

1
1

answered Jun 16 '13 at 15:19

templatetypedef

362,284
104
897
1,065

I was trying to find out about the BigO Notation as I am comparing the time/comparisons that it takes for a quicksort, heapsort and mergesort to sort an array of random numbers where the array is the same for each sort algorithm. And the number of elements is the same as well. What I wanted to see was which one performed better and how close each got to actually going ton n log(n), I have run this for 100, 500, 1000, 5000, 10000, 50000 and 100000 elements and what I am seeing is that in general Quicksort uses least amount of comparisons, heapsort is in the middle and mergesort uses most – wfsteadman Jun 17 '13 at 01:24
n log(n) for 100 is 664.39 and for quicksort in the example below it took 218 comparisons to put the list in ascending order, it took heapsort 477 comparisons and mergesort 601 comparisons. Is that not a valid test? Q Sort H Sort M Sort 218 477 601 224 474 588 234 482 597 – wfsteadman Jun 17 '13 at 01:24
@wfsteadman- Those numbers don't sound right - quicksort typically makes *more* comparisons than heapsort and mergesort and is faster for other reasons. Can you post a separate question containing your code to get these numbers so that people can look over it? – templatetypedef Jun 17 '13 at 17:34
I can start a new topic, but I uploaded the zipped file here: [link]http://steadmanusa.com/algorithm.html if that helps. This is a Java Program and you can change the loops as I have it running for each of the algorithms x number of times. – wfsteadman Jun 17 '13 at 21:09
I would love someone who understands comparisons to look and see if my algorithm comparisons are correct for sure. Thanks – wfsteadman Jun 17 '13 at 21:12
1

@wfsteadman- You are not counting at least one comparison in your heapsort implementation (check the `if` statement on line 68). In quicksort, you are trying to count comparisons in `swap`, which is incorrect - you need to count *all* comparisons, not just those lead to swaps being made. You are also missing a huge number of comparisons against the pivot from `partitionIt`. Finally, I think your code for insertion sort is incorrect (you need to swap the element backwards, not just overwrite the element). You're also only counting one comparison per loop iteration. – templatetypedef Jun 17 '13 at 21:19
1

@wfsteadman- You really should post each piece of code as a separate question and ask for help debugging it - I don't have much space to write in the comments and if I miss anything, no one is going to notice it unless they read the whole comment exchange. – templatetypedef Jun 17 '13 at 21:20
templatetypedef, thank you so much for the comments. I am so bad and knowing where to add the comparisons in my code. In a previous class, I did well, but I think the teacher was a bit forgiving when it came to counts. I think he was more interested that we had a basice understanding of the principles verses getting every comparison exactly right. I will post the questions. Just didn't want folks to think I was trying to get them to do my work. I have done the bulk just need some guidance on where to put comparisons. – wfsteadman Jun 18 '13 at 02:35

score 3 · Answer 2 · answered Jun 16 '13 at 15:12

Let's say that you know that your algorithm requires O( n log_2 n ) comparisons when sorting n elements.

This tells you the following, and only the following: there exists a constant number C such that, as n approaches infinity, the algorithm never requires more than C * n * log_2 n comparisons.

It does not tell you anything about the specific number of comparisons that might be required for any value of n -- it tells you about how the number of comparisons required grows in the limit as the number of elements grows.

You can not use the Big-O complexity of your sorting algorithm to prove anything about the behaviour of a particular finite n, such as 100 elements. Sorting 100 elements might require 64 comparisons, or 664, or 664 million. The latter is clearly not reasonable, but Big-O simply provides no information here.

How does heapsort's big-O runtime relate to the number of comparisons made?

2 Answers2