3

I want to get the theoretic reason not the experimental result. Also how can we determine when the data size is called small or large?

I did not explained clearly, I meant when the input data size is small, we usually choose to use Insertion sort or not Quick Sort, that is right. So I want to know why is that?

MLavoie
  • 9,671
  • 41
  • 36
  • 56
lsbbo
  • 304
  • 3
  • 12

1 Answers1

16

Remember that in asymptotic analysis, we ignore constant factors. So Quicksort's O(n log n) complexity is really O(C(n log n)), where C is some unknown constant. Similarly, Insertion sort's O(n^2) is actually O(C(n^2)). Let's call those constants Cq and Ci.

So Insertion sort will be faster when (Ci * n^2) < (Cq * (n log n)).

It should be obvious from looking at the two algorithms that Ci < Cq. The insertion sort is incredibly simple. The algorithm is nothing but comparison and swap, with a little loop overhead thrown in.

Quicksort is a little bit more complex, requiring more steps per iteration, but fewer iterations.

Consider sorting a five element array. Insertion sort will do, at worst:

  • 5 increments and comparisons of the outer loop control variable
  • 15 increments and comparisons of the inner loop control variable
  • 15 element comparisons
  • 15 swaps

Now look at Quicksort, which in the average case has to partition four subarrays. The 5-element array gets split into two subarrays of 3 and 2 elements. The 3-element subarray is further partitioned into subarrays of 1 and 2 elements. And then the two 2-element subarrays are partitioned.

So the partition method will be called four times. Each partition step requires a minimum of two swaps in addition to the comparison and swapping of elements, and other overhead. When you add it all up, you see that Quicksort does more work per iteration. When the number of iterations is small, Insertion sort does less total work even though it does more iterations.

You can do a step-by-step analysis to determine the theoretical value of "small," where Insertion sort will be faster than Quicksort. Typically that's done by counting "basic operations," although the definition is somewhat flexible. In this case it's pretty easy: a comparison, assignment, or function call is a "basic operation."

How the theoretical result matches up with the experimentally derived result will depend on the particular computer hardware and also on how expensive comparisons are. If comparisons are very expensive, then you'll want to select the algorithm that does the fewest number of comparisons. But if comparisons are relatively inexpensive (comparing numbers, for example, or even strings provided that they don't have long common prefixes), then the algorithmic overhead is the limiting factor and the simple inefficient algorithm outperforms the complex efficient algorithm.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351