4

I have to write a quicksort algorithm that uses the median of the array as the pivot. From my general understanding from what I read in a book, I have to use a select algorithm in which I split the array into n/5 sub arrays, sort each of the sub arrays using insertion sort, find the median, then recursively call the select algorithm to find the median of the medians. I'm not even sure how to start this and I'm pretty confused. the selectMedian algorithm call is supposed to look something like this: SelectMedian(int first, int last, int i) where i is the ith index I want to select (in this case it would be the middle index, so array.length/2). The book I'm reading gives this description of it:

The algorithm in words (if n>1):

1. Divide n elements into groups of 5
2. Find median of each group (use insertion sort for this)
3. Use Select() recursively to find median x of the  n/5
medians
4. Partition the n elements around x.  Let k = rank(x)
5. if (i == k) then return x
if (i < k) then use Select() recursively to find i-th
smallest element in first partition else (i > k) use 
Select() recursively to find (i-k)th smallest element in 
last partition.

can anyone assist me in writing this algorithm? thanks!

rohitt
  • 1,124
  • 6
  • 18
Wonger
  • 285
  • 6
  • 18
  • what's your exact question? what's your current problem? – Thomas Uhrig Oct 28 '11 at 20:20
  • That's not literally how the book describes the algorithm, I hope? – G_H Oct 28 '11 at 20:24
  • Try reading this similar question: http://stackoverflow.com/questions/164163/choosing-a-pivot-for-quicksort – filpa Oct 28 '11 at 20:27
  • I believe the "book" the OP mention is CLRS. Just a thought. – Óscar López Oct 28 '11 at 21:13
  • The exact question is "Write a quick sort algorithm that uses the median as the pivot". This is not the books description for the quicksort algorithm, this is the description for a select algorithm which looks for the "i-th" largest element in the array – Wonger Oct 28 '11 at 23:13
  • @Wonger Why don't you decouple the algorithm for finding the median and quicksort? You find the median using the median algorithm, and use it as pivot for a quicksort. Worst case O(NlogN) quicksort! – bloops Jul 14 '12 at 20:45

3 Answers3

3

Would that really be necessary? Why not use the median of three where you select the pivot based on the median of three values, ie. the first, middle and last values.

Or you could even use a random pivot, which will drastically lower the chances of ending up with QuickSort's worst case time of O(N²), which may also be appropriate for your implementation.

AusCBloke
  • 18,014
  • 6
  • 40
  • 44
  • Using median of three means the time complexity of finding the median will be greater than O(NlogN) - ruining our goal of getting to O(N^2). Using a random pivot means you'll expect O(NlogN), but performance will not be consistent, which is required for some applications. – Kane Oct 28 '11 at 20:45
  • What are you talking about? Our "goal" isn't O(N²), that's *worst case*, and finding the median of three elements is quicker than partitioning the array and finding the median of five for each part. – AusCBloke Oct 28 '11 at 21:02
  • Apologies, I meant ruining out goal of getting to *better than* O(N^2). While it is true that searching 3 items takes less time than searching 5, they are both O(1) operations. However, searching n/3 arrays for medians is O(N^2) and searching n/5 arrays for medians is O(NlogN). You need to do the T analysis to see this - I believe CLRS has an example. As an aside, how do you do the superscript? I couldn't find it in the help. – Kane Oct 28 '11 at 21:06
  • 1
    Oh you're talking about searching n/ **3** smaller arrays. I'm only talking about getting three elements, getting the median of those and using that as the pivot. *Only doing that once,* simple as that. With the "²", it's an actual character I just copied off the net, I don't think the editor will do that for you. – AusCBloke Oct 28 '11 at 21:13
  • @Kane: you could make your performance consistent by estimating a likely upperbound on running time based on n, sorting quickly, and then sleeping until you reach that upperbound. But that's dumb. Consistentency in running time isn't really a good thing to optimize. OTOH, I can totally be convinced that say, 99%th percentile running time is worthwhile. Throwing a determinsitic median find in the middle of a quicksort is not going to help 99th percentile running time though. – Rob Neuhaus Oct 28 '11 at 21:38
  • 1
    because,according to the book, using the median of the array gaurantees an even split which therefor gaurantees O(nlgn), assuming the algorithm to find the median is written correctly – Wonger Oct 28 '11 at 23:18
  • @rrenaud: Oh, I absolutely agree that this is not something you will use in the real world in 99% of applications. In fact, I asked that exact same question when we covered this question in my Algorithms class back in undergrad. (I mean, if you really cared about consistency in your sort, why not use mergesort?). However, this being a problem from someone's Algorithms class and not something is being used in real life (I would bet the prof is using CLRS), I was trying to explain the reasoning behind the problem. It's really more of a theory problem than a real-world problem. – Kane Oct 28 '11 at 23:26
0
QUICKSORT2(A, p, r)
if p<r
    median=floor((p + r) /2) - p + 1
    q=SELECT(A, p, r, median)
    q=PARTITION2(A, p, r, q)
    QUICKSORT2(A, p, q-1)
    QUICKSORT2(A, q+1, r)

PARTITION2(A, p, r, q)
switch between A[r] and A[q]
return PARTITION(A, p, r)
Avi Cohen
  • 3,102
  • 2
  • 25
  • 26
0

I assume you can figure out created n/5 sub-arrays of 5 elements each.

Finding the median of a subarray is fairly easy: you look at each element and find the element which has two smaller elements.

For example, you have 1 4 2 3 5. 1 has no smaller elements. 4 has three smaller elements. 2 has one smaller element. 3 has two smaller elements; this is the one you want.

Now you have found n/5 medians. You want to find the median of the medians, so you run the algorithm again.

Example:

1 7 2 4 9 0 3 8 5 6 1 4 7 2 3

[1 7 2 4 9][0 3 8 5 6][1 4 7 2 3]

findMedian([1 7 2 4 9]) = 4;

findMedian([0 3 8 5 6]) = 5;

findMedian([1 4 7 2 3]) = 3;

[4 5 3]

findMedian([4 5 3]) = 4;

4 is your pivot.

The reason you do this is to try and split your array evenly; if your array is split lopsided, you'll get O(N^2) performance; if your array is split evenly, you get O(NlogN) performance.

Selecting a random pivot means you could get either one - in practice it would balance out to O(NlogN) but a lot of applications want consistent performance, and random quicksort is not consistent from run to run.

The reason we use 5 (instead of 3 or 7) is because we're adding another term of time complexity searching for the median - this term has to be less than O(NlogN) but you want it to be as small as possible. Using 3 gets you O(N^2), using 5 gets you O(NlogN), and 5 is the smallest number for which this is true.

(the algorithm to find the median in linear time was given by Blum, Floyd, Pratt, Rivest and Tarjan in their 1973 paper "Time bounds for selection", and answered a famous open problem)

John Donn
  • 1,718
  • 2
  • 19
  • 45
Kane
  • 4,047
  • 2
  • 24
  • 33