Why are these algorithms running faster than they should be?

Question

I created a C++ program that outputs the input size vs. execution time (microseconds) of algorithms and writes the results to a .csv file. Upon importing the .csv in LibreOffice Calc and plotting graphs,

I noticed that binary search for input sizes upto 10000 is running in constant time even though I search for an element not in the array. Similarly, for upto the same input size, merge sort seems to run in linear time instead of the linear-logarithmic time it runs in in all cases.

Insertion Sort and Bubble Sort run just fine and the output plot resembles their worst case quadratic complexity very closely.

I provide the input arrays from a file. For n = 5, the contents of the file are as follows. Each line represents an input array:

The results.csv file on running insertion sort is:

Input,Time(ms)
5,4
4,3
3,2
2,2
1,2

The graph of binary search for maximum input 100 is here.

Also the graph of merge sort for maximum input 1000 is here which looks a lot like it is linear (the values in the table also suggest so).

Any help as to why this is happening will be greatly appreciated.

Here is a link to the github repository for the source code: https://github.com/dhanraj-s/Time-Complexity

The first graph is **not** constant and the second is **not** linear. The first one simply tells you 1ms resolution is no good for a sample size this small, fix either of the two and you should see what you expect. The second suggests two things, look at bigger examples (logarithms show best on large scales) and more importantly, don't plot t(n), plot something that should show you the logarithm alone, like (t/n) as a function of (n). — Qubit, Oct 08 '18 at 14:10
And I hope you don't give them sorted arrays like this to sort, that is a fairly special case. Feed them randomly generated examples instead, possible with the same fixed seed such that you can reproduce your measurements. And then sample a few of these random samples for each run, of course. — Qubit, Oct 08 '18 at 14:14
I think your numbers are too small, and your times are too short, to obtain good measurements. — Tim Randall, Oct 08 '18 at 14:15
Remember that your benchmarking may including time spent running other applications in the background. Search the internet for "c++ benchmarking". — Thomas Matthews, Oct 08 '18 at 14:27
Possible duplicate of [What is a plain English explanation of “Big O” notation?](https://stackoverflow.com/questions/487258/what-is-a-plain-english-explanation-of-big-o-notation) (see especially the answer about "how things scale") — Bernhard Barker, Oct 08 '18 at 15:07
You should also read up on doing a proper benchmark. You should be running the method many times if it runs for only a few milliseconds, otherwise your measurements are likely to be inaccurate. — Bernhard Barker, Oct 08 '18 at 15:26
I generated random numbers instead of the sorted ones and instead of plotting t(n), I plotted log(t(n)). When I do that, I kind of get what I expect for merge sort i.e, a logarithmic curve (which i get only because t(n) looked linear). Binary search remains the same even if i give it a max input of 10000. — DS2830, Oct 09 '18 at 06:50
@Qubit I tried what you suggested and plotted t/n instead of t. Nothing changes for merge sort and the binary search graph becomes hyperbolic (1/n). Should I be working with even larger input to see any results? If yes, how large? I am using the clock() function in ctime to measure execution time, should I be using something that gives even more resolution? — DS2830, Oct 09 '18 at 06:53
@Dukeling I don't quite understand what you mean when you say "running the method many times if it runs for only a few milliseconds". Could you please clarify? — DS2830, Oct 09 '18 at 06:58
@TimRandall Could you please explain what you mean when you say "times are too short"? — DS2830, Oct 09 '18 at 07:02
A logarithm usually requires exponentially large inputs if you want to see it, say on a range from 10^1 to 10^9. More is better. Again, make sure you run random tests not the special case you showed above - anyone can sort that in O(n) by picking the right algorithm. To measure execution time, use `std::chrono::high_precision_clock`, that would give you better resolution. Keep in mind function calls etc have overhead, so the very short times (i.e. n->1) should converge to a constant because everything around your algorithm takes longer than the algorithm itself. — Qubit, Oct 09 '18 at 08:37
@DS2830 I meant that the periods you're measuring are not large enough compared to the granularity of your measuring tool. If you try to measure the width of a regular office staple with a ruler marked in mm, you're probably going to get an answer of 1mm. But if you can measure a block of 50 staples,, you can get a better answer. — Tim Randall, Oct 09 '18 at 13:17

score 1 · Answer 1 · answered Oct 08 '18 at 14:13

1

Complexity is about asymptotic worst case behaviour.

...worst case...

Even a quadratic algorithm may fall back to a linear variant if the input allows. It's complexity is still quadratic because for the worst case, the algorithm can only guarantee a quadratic runtime.

...asymptotic...

It might well be that the asymptotic behaviour for the algorithms starts to settle in only for input sizes much bigger than what you chose.

That being said, in practice complexity alone is not the most useful metric, but if you do care about performance, you need to measure.

answered Oct 08 '18 at 14:13

463035818_is_not_an_ai

109,796
11
89
185

Complexity is not just about worst case. You can also have the complexity of the best or average cases. The average case is perhaps the most common (with quicksort being a prime example, which is commonly said to be O(n log n), which is the average case, while it's worst case is O(n^2)). – Bernhard Barker Oct 09 '18 at 07:41
@Dukeling you are completely right, I was thinking of complexitiy of standard algorithms which typically give an upper bound for their complexity. Will edit the answer to be less wrong as soon as I have time – 463035818_is_not_an_ai Oct 09 '18 at 08:23

Why are these algorithms running faster than they should be?

1 Answers1