15

The Median of medians approach is very popular in quicksort type partitioning algorithms to yield a fairly good pivot, such that it partitions the array uniformly. Its logic is given in Wikipedia as:

The chosen pivot is both less than and greater than half of the elements in the list of medians, which is around n/10 elements (1/2 * (n/5)) for each half. Each of these elements is a median of 5, making it less than 2 other elements and greater than 2 other elements outside the block. Hence, the pivot is less than 3(n/10) elements outside the block, and greater than another 3(n/10) elements outside the block. Thus the chosen median splits the elements somewhere between 30%/70% and 70%/30%, which assures worst-case linear behavior of the algorithm.

Can somebody explain it a bit lucidly for me. I am finding it difficult to understand the logic.

SexyBeast
  • 7,913
  • 28
  • 108
  • 196

1 Answers1

17

Think of the following set of numbers:

5 2 6 3 1

The median of these numbers is 3. Now if you have a number n, if n > 3, then it is bigger than at least half of the numbers above. If n < 3, then it is smaller than at least half of the numbers above.

So that is the idea. That is, for each set of 5 numbers, you get their median. Now you have n / 5 numbers. This is obvious.

Now if you get the median of those numbers (call it m), it is bigger than half of them and smaller than the other half (by definition of median!). In other words, m is bigger than n / 10 numbers (which themselves were medians of small 5 element groups) and bigger than another n / 10 numbers (which again were medians of small 5 element groups).

In the example above, we saw that if the median is k and you have m > k, then m is also bigger than 2 other numbers (that were themselves smaller than k). This means that for each of those smaller 5 element groups where m was bigger than its median, m is also bigger than two other numbers. This makes it at least 3 numbers (2 numbers + the median itself) in each of those n / 10 small 5 element groups, that are smaller than m. Hence, m is at least bigger than 3n/10 numbers.

Similar logic for the number of elements m is bigger than.

Shahbaz
  • 46,337
  • 19
  • 116
  • 182
  • Just another question, how does this method guarantee that this number will be the median? The median is a number which partitions the array into the upper and lower half. So what does this 30-30-70 figure signify? – SexyBeast Sep 22 '12 at 18:56
  • Well, the median is in the middle, but `m` (in the text above) is not the median of all numbers. It is the median of only 1/5 of the numbers (which are themselves medians of smaller 5 element groups). Try reading the last paragraph with more attention. In the end, where it is concluded that `m` is bigger than at least `3n/10` of the numbers, well that translates to `m` is bigger than at least 30% of the numbers. So in the end, it goes like `m` is bigger than at least 30% and smaller than at least another 30%. There is 40% left which we are not sure of. – Shahbaz Sep 23 '12 at 11:07
  • Then how come it gives a 50-50 partition on average? The 50-50 partition is given by the normal median, right? – SexyBeast Sep 23 '12 at 11:36
  • 11
    It doesn't give `50-50` partition on average. It always gives somewhere between `30-70` and `70-30` (maybe you could call that `50-50` on average?), but that's not the point. For quicksort to give `O(nlogn)` time complexity, it needs to be able to divide the array into partitions proportional to the size of the array. That's what gives the `logn` recursion depth. `30-70` division already gives a `3n/10` and `7n/10` division which is proportional to `n`. So even though it's not a perfect `50-50`, it will still yield logarithmic recursion depth (except it's not `log` in base 2, but base `10/7`) – Shahbaz Sep 23 '12 at 14:50