2

As far I learned from my University, it is proved that the lower bound of an comparison-based algorithm that sorts random data is Ω(nlogn). I also know that the average case of Heapsort and Quicksort is O(nlgn).

Therefore, I tried to plot the times these algorithms needed to sort a set of random data.

I used the algorithms that are posted at Roseta Code: quicksort and heapsort. When I tried to plot the time each one needed to sort random data, for up to 1 Million numbers, I got the following charts that appear to be linear:

Heapsort Quicksort

You can also find the results I got from running heapsort from here. In addition you can also find the results I got from running quicksort from here

However, I do get O(n^2) time complexity when running bubblesort as shown at the plot below: enter image description here

Why is this? What I might be missing here?

Neil G
  • 32,138
  • 39
  • 156
  • 257
Jim Blum
  • 2,656
  • 5
  • 28
  • 37
  • 4
    scales.....size of N too small? – Mitch Wheat Jul 05 '14 at 11:53
  • @MitchWheat. Thanks a lot for your comment. However, N was up to 1 Million... Do I have to use an even larger N? – Jim Blum Jul 05 '14 at 11:53
  • Put a sleep in your comparison function and see what happens? – nishantjr Jul 05 '14 at 11:57
  • 4
    nlogn graphs can look linear. Look at the yellow line at http://science.slc.edu/~jmarshall/courses/2002/spring/cs50/BigO/sorting.gif – fgb Jul 05 '14 at 12:00
  • 1
    [Similar](http://stackoverflow.com/questions/7830727/n-log-n-is-on). Theres a nice [diagram](http://stackoverflow.com/a/7830804/1278288) too – nishantjr Jul 05 '14 at 12:00
  • hmmm Thats very interesting @fgb. So, If I get a function like this, how I can know if it is nlogn or linear? – Jim Blum Jul 05 '14 at 12:02
  • Thank you too @nishantjr. If I run a black box algorithm, and I have no idea about it's complexity, and I get this kind of plot, how I can say for sure that this plot is nlogn or n? – Jim Blum Jul 05 '14 at 12:04
  • @nishantjr By the way, I cannot add a sleep in my compirison function as I have to use the sorting algorithms as benchmarks. Therefore I do not have to edit them in any way... – Jim Blum Jul 05 '14 at 12:14
  • Most sorting functions ive seen allow you to specify a comparison function. This allows sorting of non primitive types, or using a non-default sorting order – nishantjr Jul 05 '14 at 12:16
  • OK @nishantjr thanks. Do you also have any idea of how I can classify an algorithm as O(nlogn) or O(n) if I get this kind of results? – Jim Blum Jul 05 '14 at 12:21
  • Not sure, For something like sorting, you can get very different number depending on the input - if the data is already sorted, you'd get `O(N)` for many `O(N*log(N))` algorithms. You'd have to do some kind of statistical analysis for *typical inputs* or something. I generally just read the documentation if the source is not available ;) – nishantjr Jul 05 '14 at 12:26
  • The problem is the lower end, you should start with much smaller sizes like 10 or 100. Just look at the log-factor: `log(20000)==10` and `log(1000000)==14`. But if you start at, say, 100, you get a factor of `log(100)==4.6` and you should notice a difference. – pentadecagon Jul 05 '14 at 13:19

1 Answers1

8

The difference is simply too small to see with the naked eye at this scale:

Using your HeapSort results (600ms for 1000000 entries), here is a O(n) function (green) and a O(n log n) function (red): enter image description here (from http://fooplot.com/plot/gnbb0vdgno )

The two functions in this picture are:

  • y = 600/1000000 * x green
  • y = 1/(10000 log(10)) * x*log(x) red

(Note that these functions have vastly different constant scaling factors, but of course these do not matter in Big-O notation.)

However just because they are hard to see in a diagram, does not mean they are impossible to distinguish.

As mentioned in the comments, your main options are bigger datasets, or slower comparison functions. Most sorting algorithms will allow you to specify a comparison function, which should not, under normal circumstances, change the O() time complexity. (beware of non-transitive comparison functions though)

If that is not possible, and you just want to thread the algorithms as black boxes, you could simply repeat the experiment and average the results until noise is low enough to distinguish between these two curves.

To get the appropriate "ideal" n log n curve for comparison with your averaged data, you need to solve the equation y = a*x * log(x); y=MAXIMUM_TIME; x=MAXIMUM_INPUT_LENGTH;, for example with Wolfram Alpha


One important point here is that even though these curves look similar, this does not mean that the run-time of a hypothetical linear sorting algorithm would not be worthwhile for less than a million entries. If you managed to come up with a linear sorting algorithm with the same constant factor as the n log n algorithm, the curves would look like this:

enter image description here

HugoRune
  • 13,157
  • 7
  • 69
  • 144