Why QuickSort bad at sorting almost sorted data

Question

Why is QuickSort bad at sorting almost sorted data? In comparison, why is insertion sort better? Trying to understand Big O notation!

I find these holds without explanation very annoying and non-constructive. Trying to read the minds of the moderators, there are many properties of an algorithm, such as time complexity, space complexity either in the average or worst case sense, parametrized or not, asymptotic or finite. When you say "bad" or "better" with no reference to any such property, I know you mean asymptotic time complexity because you mention big-o notation and it is the most common one refers to, but our moderators do not seem to allow beginner language type questions. — piccolbo, Dec 12 '18 at 17:30
Choosing the middle value for pivot works with sorted or reverse sorted data, but for nearly sorted data, there could be a worse case, Using a median of 3 approach (sort 1st, middle, last, and use middle) reduces chance of worst case. For 41 or more elements, Visual Studio's std::sort does a median of 9, using 3 instances median of 3 to get 3 medians for the thirds of a sub-array, then one more median of 3 for the 3 medians. This reduces the chance of worst case, but increases the running time for best case due to overhead. — rcgldr, Dec 12 '18 at 18:18

score 4 · Answer 1 · answered Dec 11 '18 at 23:55

Your statement is true for certain variants of QS depending on the choice of pivot. QS performance depends on the pivoting operation to divide the data into approximately equally sized chunks, which will then be sorted separately. If the pivot is the min or max of the data, or represents a high or low percentile, the pivoting operation will divide the data into two parts whereby most of the data is in one of the two, which still needs to be sorted. If the first element of the data is chosen as a pivot, and the data is sorted, this worst case scenario occurs. By just choosing a random element as pivot, the worst case scenario has a negligible chance of occurring. This is irrelevant to worst case analysis, but on average (over possible pivots, worst case wrt input) or in practice this results in good performance.

Charlim · Answer 2 · 2018-12-12T00:22:24.193

Quicksort's algorithm is as follows:

Select a "pivot" value from the elements in the list.
Reorder the list so that all values are in their correct position relative to the pivot (e.g. if we want to sort the list in ascending order then all values less than the pivot would go before the pivot, and all values greater than the pivot would go after the pivot).
Quicksort the sections of the list before and after the pivot.

Whether the assertion that it performs poorly with sorted/nearly-sorted lists is even true depends entirely upon how step 1 is performed. What is the pivot? Say I'm trying to sort the following list into ascending order:

1, 2, 3, 4, 5, 6

Well, let's consider step 1. Which do I use as a pivot? If we designed our code under the assumption that the list order is random, we'd probably just use the first element, as any pivot is equally likely to be good when the order is completely random. In this case, however, the two sub-lists that need to be sorted are extremely uneven. Specifically, the first is empty, and the second is all remaining values

2, 3, 4, 5, 6

When we sort it, we will use 2 as the pivot and find the exact same thing happens again. This ultimately means that each value is compared to each other value. If we had selected 3 as the pivot instead, however, we would then have our remaining values split into 1, 2 and 4, 5, 6. As a result, 1 would be compared to 2, but neither would ever need to be compared to any of the values in 4, 5, 6. Let's consider how 4, 5, 6 would then be sorted. If 4 were selected as the pivot, 4 would be compared to 5 and 6, and then 5 would need to be compared to 6 in the next iteration. Conversely, were 5 our pivot, 5 would be compared to 4 and 6, but 4 and 6 would never be compared to each-other.

Note that this problem is the same for cases where the list is in perfectly reversed order as well.

Of course, a solution could be to use a different technique for choosing a pivot.

In terms of big O notation, Insertion-sort has a O(n^2), and Quicksort has a worst-case O(n^2), but a best-case O(nlog(n)). Insertion-sort is almost never preferable to Quicksort.

Addendum: Insertion-sort works well for a pre-sorted list because it works by iteratively comparing elements to their adjacent element to see if they should be swapped with one-another. In a pre-sorted list there would be no swapping, and as such no need for more than 1 comparison per element, and as such could be considered a O(n).

Also note that, although Quicksort has the exact same problem for a sorted and reversed lists, insertion sort is not just as good for both sorted and reversed lists. For insertion-sort, a pre-sorted list is ideal, but a reversed list is the worst case. — Charlim, Dec 12 '18 at 00:31
I am slightly misusing big-O notation in this explanation, but I won't edit it since I think the meaning is clear anyway — Charlim, Oct 04 '22 at 21:14

Why QuickSort bad at sorting almost sorted data

2 Answers2

Linked