16

I was going through lecture videos by Robert Sedgwick on algorithms, and he explains that random shuffling ensures we don't get to encounter the worst case quadratic time scenario in quick sort. But I am unable to understand how.

chiastic-security
  • 20,430
  • 4
  • 39
  • 67
Paritosh Pandey
  • 167
  • 1
  • 4

5 Answers5

23

It's really an admission that although we often talk about average case complexity, we don't in practice expect every case to turn up with the same probability.

Sorting an already sorted array is worst case in quicksort, because whenever you pick a pivot, you discover that all the elements get placed on the same side of the pivot, so you don't split into two roughly equal halves at all. And often in practice this already sorted case will turn up more often than other cases.

Randomly shuffling the data first is a quick way of ensuring that you really do end up with all cases turning up with equal probability, and therefore that this worst case will be as rare as any other case.

It's worth noting that there are other strategies that deal well with already sorted data, such as choosing the middle element as the pivot.

chiastic-security
  • 20,430
  • 4
  • 39
  • 67
6

The assumption is that the worst case -- everything already sorted -- is frequent enough to be worth worrying about, and a shuffle is a black-magic least-effort sloppy way to avoid that case without having to admit that by improving that case you're moving the problem to another one, which happened to get randomly shuffled into sorted order. Hopefully that bad case is a much rarer situation, and even if it does come up the randomness means the problem can't easily be reproduced and blamed on this cheat.

The concept of improving a common case at the expense of a rare one is fine. The randomness as an alternative to actually thinking about which cases will be more or less common is somewhat sloppy.

keshlam
  • 7,931
  • 2
  • 19
  • 33
2

In case of randomized QuickSort, since the pivot element is randomly chosen, we can expect the split of the input array to be reasonably well balanced on average - as opposed to the case of 1 and (n-1) split in a non randomized version of the algorithm. This helps in preventing the worst-case behavior of QuickSort which occurs in unbalanced partitioning.

Hence, the average case running time of the randomized version of QuickSort is O(nlogn) and not O(n^2);

Ujjwal
  • 71
  • 7
0

What does a random shuffle do to the distribution on the input space? To understand this, let's look at a probability distribution, P, defined over a set S, where P is not in our control. Let us create a probability distribution P' by applying a random shuffle, over S to P. In other words, every time we get a sample from P, we map it, uniformly at random to an element of S. What can you say about this resulting distribution P'?

P'(x) = summation over all elements s in S of P(s)*1/|S| = 1/|S|

Thus, P' is just the uniform distribution over S. A random shuffle gives us control over the input probability distribution.

How is this relevant to quicksort? Well, we know the average complexity of quicksort. This is computed wrt the uniform probability distribution and that is a property we want to maintain on our input distribution, irrespective of what it really is. To achieve that, we do a random shuffle of our input array, ensuring that the distribution is not adversarial in any way.

Pradhan
  • 16,391
  • 3
  • 44
  • 59
-1

Is the video in coursera? Unfortunately, shuffle decrease performance to O(N^2) with data n,n,...,n,1,1,...,1. I have inspected Quick.java with nn11.awk that generate such data.

$ for N in 10000 20000 30000 40000; do time ./nn11.awk $N | java Quick; done | awk 'NF>1'

real    0m10.732s
user    0m10.295s
sys     0m0.948s

real    0m48.057s
user    0m44.968s
sys     0m3.193s

real    1m52.109s
user    1m48.158s
sys     0m3.634s

real    3m38.336s
user    3m31.475s
sys     0m6.253s