52

The C++11 standard guarantees that std::sort has O(n logn) complexity in the worst case. This is different from the average-case guarantee in C++98/03, where std::sort could be implemented with Quicksort (maybe combined with insertion sort for small n), which has O(n^2) in the worst case (for some specific input, such as sorted input).

Were there any changes in std::sort implementations in different STL libraries? How is C++11's std::sort implemented in different STLs?

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Spock77
  • 3,256
  • 2
  • 30
  • 39
  • 17
    Not a proper answer because I haven't conducted a full survey, but basically everyone was already using Introsort anyway, and that's why the stronger requirement is in practice not a problem for implementers. And even before Introsort, nobody with any sense used Quicksort with the leftmost element as the pivot. That was a simplification for the purposes of demonstrating the algorithm, although of course it does actually work fine for random data too. So in practice the worst case of Quicksort is *not* sorted input. – Steve Jessop Mar 12 '14 at 00:02
  • 2
    Yeah hybrid quicksorts (like introsort) are what pretty much everyone uses. I guess you could also find merge sort hybrids like timsort. – R. Martinho Fernandes Mar 12 '14 at 12:11
  • If you are trying to evaluate sorting methods, check on the number of constructors created during the sort. If the default constructor is inefficient, it will be slow, no matter how fast the sort is. – cup Mar 15 '14 at 07:37
  • possible duplicate of [What algorithms do popular C++ compilers use for std::sort and std::stable\_sort?](http://stackoverflow.com/questions/14547801/what-algorithms-do-popular-c-compilers-use-for-stdsort-and-stdstable-sort) – Stephan Dollberg Mar 15 '14 at 14:10
  • 2
    @bamboon it is not a duplicate because attracts attention to the strengthening requirements to the std::sort in the C++11 standard. And really the question is about: do C++11 std::sort implementations match C++11 standard O(n logn) worst case requirement? ( In mentioned previous answer it is said about O(n^2) worst case complexity of std::sort which is now should not be possible. ) – Spock77 Mar 16 '14 at 00:07
  • 1
    @cup You meant the copy constructor (assuming there is no move constructor and no fast specialization of `std::swap`), I assume? I don’t see why the default constructor should be invoked at all. – Christopher Creutzig Mar 17 '14 at 07:00
  • Sorry - yes, I did mean the copy constructor. – cup Mar 17 '14 at 08:52

2 Answers2

26

The question is, how can STL say std::sort worst case is O(N log(N)), even though it is in essence a QuickSort. STL's sort is IntroSort. IntroSort is in essence a QuickSort, the difference introduced change the worst case complexity.


QuickSort worst case is O(N^2)

What ever partitioning you choose, there exist a sequence that QuickSort will run on O(N^2). The partitioning you choose only decreases the probability of the worst case to occur. (Random Pivot Selection , Median-Of-Three, etc.)

EDIT: Thanks to @maxim1000 s correction. Quicksort with pivot selection algorithm Median of Medians has O(N log(N)) worst case complexity, but due to the overhead it introduces it isn't used in practice. It shows how good selection algorithm, can change the worst-case complexity through pivot selection, theoretically.


What does IntroSort do?

IntroSort limits the branching of QuickSort. This is the most important point, that limit is 2 * (log N). When limit is reached, IntroSort can use any sorting algorithm that has worst case complexity of O(N log(N)).

Branching stops when we have O(log N) subproblems. We can solve every subproblem O(n log n). (Lower case n stands for the subproblem sizes).

Sum of (n log n) is our worst case complexity, now.

For the worst case of QuickSort; assume we have an already sorted array, and we select always the first element in this array as the pivot. In every iteration we get rid of only the first element. If we went this way until the end, it would be O(N^2) obviously. With IntroSort we stop QuickSort, when we reach a depth log(N), then we use HeapSort for the remaining unsorted array.

16 -> 1  /**N**/
   \
    > 15 -> 1 /**N - 1**/
         \
          > 14 -> 1 /**N - 2**/
               \
                > 13 -> 1 /**N - log(N)**/  
                     \
                      > 12 /**(HeapSort Now) (N - log(N)) log (N - log(N))**/

Sum them up;

Until branching stops, N + (N - 1) + ... + (N - log(N)) operations done. Instead of using gauss to sum up, we can simply say N + (N - 1) + ... + (N - log(N)) < N log(N).

The HeapSort Part is (N - log(N)) log(N - log(N)) < N log(N)

Overall complexity < 2 N log(N).

Since the constants can be omitted, the worst case complexity of IntroSort is O(N log(N)).


Added Info: GCC STL implementation source code is here. Sort function is at line 5461.

Correction: *Microsoft .NET* sort Implementation is IntroSort since 2012. Related information is here.

Cahit Gungor
  • 1,477
  • 23
  • 30
  • The question was a little bit different - how std::sort is implemented in various STL implementations now - did they changed or not? How can you prove that STL's sort it introsort? introsort is well described in wikipedia, though – Spock77 Mar 19 '14 at 08:00
  • 3
    GCC STL Source Code: http://gcc.gnu.org/onlinedocs/gcc-4.7.2/libstdc++/api/a01462_source.html. Start from line 5461 for `stl::sort` – Cahit Gungor Mar 19 '14 at 08:28
  • "Quicksort's worst-case performance is always O(n^2)" is incorrect. There is at least one algorithm for pivot selection which deterministically gives a good pivot - http://en.wikipedia.org/wiki/Median_of_medians. It's rarely used in practice due to constants, but it makes order of complexity for Quicksort O(n*log n). – maxim1000 Mar 19 '14 at 09:43
  • @maxim1000 thanks for the correction. I added the related information. – Cahit Gungor Mar 19 '14 at 10:32
  • @CahitGungor How documentation on dotnet is related to STL std::sort in native C++? – Spock77 Mar 19 '14 at 23:09
  • Microsoft STL implementation is not fully compliant with C++11, but in recent versions the conformance level is improved. @Alexey asked `How can you prove that STL's sort it introsort?`. Since Microsoft VS site (http://msdn.microsoft.com/en-us/library/ecdecxh1.aspx) do not include algorithm specifics, .NET site is referenced, to see that both **GCC** and **Microsoft** sorts are IntroSort. – Cahit Gungor Mar 20 '14 at 07:37
  • @CahitGungor I think that it is not correct to reference .NET site at all and this logic "Since Microsoft VS site do not include algorithm specifics, .NET site is referenced" appears seriously flawed. C# is really different language. Moreover, your VS link states: "The average of a sort complexity is O(N log N)," nothing saying about worst case. And this phrase "Microsoft sort implementation is introsort since 2012" is simply wrong :) It was in their STL long ago, at least in VS 2008 and I think since VS 2003 or VS6. – Spock77 Mar 20 '14 at 11:51
  • 1
    What I really wanted in the answer is the analysis of the STL **source code** from the people familiar with different implementations. – Spock77 Mar 20 '14 at 11:53
  • Thanks @AlexeyVoytenko. I corrected the information saying since 2012. C++11 official standard document (http://isocpp.org/files/papers/N3690.pdf page 898) defines **sort** as O(N log(N)) std::sort complexity. Any STL implementation compliant with C++11, should have O(N log(N)) complexity, no restriction about how you implement it. – Cahit Gungor Mar 20 '14 at 13:12
  • @AlexeyVoytenko (msdn.microsoft.com/en-us/library/ecdecxh1.aspx) site referenced is not for C#. – Cahit Gungor Mar 20 '14 at 13:18
  • The C++ standard does not mandate quicksort. The STL on other hand (https://www.sgi.com/tech/stl/sort.html) uses introsort and notes that "Quicksort has O(N log(N)) average complexity, but quadratic worst-case complexity." – Praxeolitic Sep 30 '14 at 05:35
  • Indeed *libc++* uses Python's TimSort, @CahitGungor. – Pedro Lacerda Apr 21 '15 at 04:00
  • Surprisingly gcc [chooses middle element as pivot](https://gcc.gnu.org/onlinedocs/gcc-4.7.2/libstdc++/api/a01462_source.html#l02304)! No randomization is used. – renadeen Nov 13 '17 at 16:56
  • LLVM update https://danlark.org/2022/04/20/changing-stdsort-at-googles-scale-and-beyond/ – Spock77 Apr 26 '22 at 09:43
22

Browsing the online sources for libstdc++ and libc++, one can see that both libraries use the full gamut of the well-known sorting algorithms from an intro-sort main loop:

For std::sort, there is a helper routine for insertion_sort (an O(N^2) algorithm but with a good scaling constant to make it competitive for small sequences), plus some special casing for sub-sequences of 0, 1, 2, and 3 elements.

For std::partial_sort, both libraries use a version of heap_sort (O(N log N) in general), because that method has a nice invariant that it keeps a sorted subsequence (it typically has a larger scaling constant to make it more expensive for full sorting).

For std::nth_element, there is a helper routine for selection_sort (again an O(N^2) algorithm with a good sclaing constant to make it competitive for small sequences). For regular sorting insertion_sort usually dominates selection_sort, but for nth_element the invariant of having the smallest elements perfectly matches the behavior of selection_sort.

TemplateRex
  • 69,038
  • 19
  • 164
  • 304