3

Given the list of numbers: 2 5 1 8 4 10 6 3 7 9 0

The actual implementation of quick sort I understand, but a question on my homework that I didn't was:

What is the optimal choice of pivot, why?

I had assumed when reading this that the obvious choice for a pivot would be the 5 or 6 since its in the middle of the list. I figured quick sort would work either way though since we choose a new pivot every time. Which makes the followup question make a little more sense, but does anyone have a formal definition?

Why is an optimal pivot not practical?

asdf
  • 657
  • 1
  • 13
  • 30

3 Answers3

6

The optimal pivot is the median of the set you're currently working on, because it will split the set into two equal-sized subsets which guarantees O(n log n) performance. The reason it's not practical is because of the cost of finding the actual median. You essentially have to sort the data to find the median, so it's like the book Catch 22 - "How do I sort the data?" "Find the median" "How do I find a median?" "Sort the data".

pjs
  • 18,696
  • 4
  • 27
  • 56
  • There is no catch 22 for finding median but the algorithm to find it in O(N) worst case are very complex to implement but there algorithms that do it in O(N) for average case. – Vikram Bhat Nov 18 '13 at 16:29
  • @VikramBhat Yes, but because of the complexity almost nobody uses those algorithms. Most people will sort and pick the middle if asked to cough up a median. And for quicksort purposes, people prefer a constant time approximation such as choose a random element or pick the median of 3 arbitrary choices. – pjs Nov 18 '13 at 16:33
  • For quick sort the practical method used to select median is O(N) in average case and it is called Quick Select. – Vikram Bhat Nov 18 '13 at 16:37
  • @VikramBhat Quickselect itself relies on a sequence of pivot choices to home in on the median, and the two most common mechanisms are the ones I mentioned - random selection or median of 3. – pjs Nov 18 '13 at 16:50
  • I meant randomized select for median runs in average case O(N) – Vikram Bhat Nov 18 '13 at 16:51
2

Optimal pivot is in the middle, because when you move it to the left or to the right (or take biggest or smallest item), you increase depth of recursion. In the worst case you will get O(n^2) except of O(n*log2(n)) when taking the middle.

dmitry
  • 468
  • 1
  • 4
  • 18
2

Optimal pivot must be median of numbers because then subproblem sizes are exactly half of original. The time complexity is defined as follows:-

 T(N) = T(N/2) + O(N)

    which evaluates to

    T(N) = O(NlogN)

Whereas if pivot ends up being the first element of array after partitioning then:-

T(N) = T(N-1) + O(N) 

T(N) = O(N^2)

which is as bad as bubble sort

The reason that using median always as pivot is not practical because the algorithm that do it in O(N) are very complex & u can always do it in O(NlogN) but that is sorting again which is the problem which we are solving. Here is an example of algorithm that evaluates median in O(N) : -

Median of Medians

Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19