5

I am self learning CLRS 3rd edition and here is one of the tougher questions that I have encountered along with its answer as a service to all.

7.4-5 We can improve the running time of quicksort in practice by taking advantage of the fast running time of insertion sort when its input is “nearly” sorted. Upon calling quicksort on a subarray with fewer than k elements, let it simply return without sorting the subarray. After the top-level call to quicksort returns, run insertion sort on the entire array to finish the sorting process. Argue that this sorting algorithm runs in O(nk+nlg(n/k)) expected time. How should we pick k, both in theory and in practice?

Avi Cohen
  • 3,102
  • 2
  • 25
  • 26

3 Answers3

3

If you eval equation nlgn > nk + nlog(n/k) you get log k > k. Which is impossible. Hence in theory it's not possible. But in practice there are constant factors involved with insertion sort and quicksort. Take a look at the solution discussed in this pdf. You might want to update your answer. .

Ankush
  • 2,454
  • 2
  • 21
  • 27
2

Actually, the answer is k = 8.

The algorithm you get is the composition of two anonymous functions, one of which is O(nk) and the other which is O(n lg n/k). Those anonymous functions hide average case constants. Insertion sort runs in n^2/4 time in the average case and randomized quicksort runs in 1.386 n lg n in the average case. We want to find a value of k which minimizes the value of nk/4 + 1.386( n lg n/k ) which equals nk/4 + 1.386 n lg n - 1.386 n lg k. This means finding the maximum of the function 1.386 lg k - k/4. That maximum value occurs at k = 8.

Robert S. Barnes
  • 39,711
  • 30
  • 131
  • 179
0

A leaf has an equal probability to be of size between 1 to k.
So the expected size of a leaf is k/2.
If the expected size of a leaf is k/2 then we expect n/(k/2)=(2n)/k such leaves.
For simplicity lets say that we expect n/k such leaves and that the expected size of each leaf is k.
The expected running time of INSERTION-SORT is O(n^2).
We found that in exercise 5.2-5 and problem 2-4c.
So the expected running time of INSERTION-SORT usage is O((n/k)*(k^2))=O(nk).
If we expect our partition groups to be of size k then the height of the recursion tree is expected to be lgn-lgk=lg(n/k) since we expect to stop lgk earlier.
There are O(n) operations on each level of the recursion tree.
That leads us to O(nlg(n/k)).
We conclude that the expected running time is O(nk+nlg(n/k)).

In theory, we should pick k=lgn, since this way we get the best expected running time of O(nlgn).

In practice, we should pick k to one of the values around lgn that gives us better performance than just running RANDOMIZED-QUICKSORT.

The second part of the answer uses big-O notation quite loosely, so for a more precise picking of k, please follow the link given in the second answer by Ankush.

Avi Cohen
  • 3,102
  • 2
  • 25
  • 26
  • 1
    Your answer is slightly wrong. The best you can say in theory is that k must be some function such that k=O(lg n). To precisely pick k you have to look at the average case constants involved in insertion sort and quicksort. – Robert S. Barnes Apr 25 '13 at 10:13
  • Thanks for the feedback and you are right. When I first posted the question and its answer, I was more interested in the first part of the question and in some sense skipped the second part of it. – Avi Cohen Apr 25 '13 at 11:29
  • Are you also studying in OpenU? We had this question on our Maman this semester. – Robert S. Barnes Apr 26 '13 at 07:01
  • I graduated a long time ago, I learn this book at my free time. – Avi Cohen Apr 26 '13 at 16:20