Implement quicksort on bi-directional iterators

Question

It seems quite straightforward to implement quicksort using bi-directional iterators with O(NlgN) time and O(lgN) space. So, what is the particular reason that std::sort() requires random-access iterators?

I have read about the topic why do std::sort and partial_sort require random-access iterators?. But it doesn't explain in what specific part of a possible std::sort() implementation that may actually require a random-access iterator to maintain its time and space complexity.

A possible implementation with O(NlgN) time and O(lgN) space:

template <typename BidirIt, typename Pred>
BidirIt partition(BidirIt first, BidirIt last, Pred pred) {
  while (true) {
    while (true) {
      if (first == last) return first;
      if (! pred(*first)) break;
      ++first;
    }
    while (true) {
      if (first == --last) return first;
      if (pred(*last)) break;
    }
    iter_swap(first, last);
    ++first;
  }
}

template <typename BidirIt, typename Less = std::less<void>>
void sort(BidirIt first, BidirIt last, Less&& less = Less{}) {
  using value_type = typename std::iterator_traits<BidirIt>::value_type;
  using pair = std::pair<BidirIt, BidirIt>;
  std::stack<pair> stk;
  stk.emplace(first, last);
  while (stk.size()) {
    std::tie(first, last) = stk.top();
    stk.pop();
    if (first == last) continue;
    auto prev_last = std::prev(last);
    auto pivot = *prev_last;
    auto mid = ::partition(first, prev_last,
      [=](const value_type& val) {
        return val < pivot;
      });
    std::iter_swap(mid, prev_last);
    stk.emplace(first, mid);
    stk.emplace(++mid, last);
  }
}

Possible duplicate of [why do std::sort and partial\_sort require random-access iterators?](http://stackoverflow.com/questions/24817274/why-do-stdsort-and-partial-sort-require-random-access-iterators) — Ivan Aksamentov - Drop, Feb 03 '16 at 07:20
@Drop Question updated. I want to know the specifics. Not a general answer. — Lingxi, Feb 03 '16 at 07:28
Quote: `To use sequential iterators would come at a O(N) memory cost to store all of those iterators`. Is this specific enough? — Ivan Aksamentov - Drop, Feb 03 '16 at 07:29
It might be a good question for [Computer Science](http://cs.stackexchange.com/) site. P.S. If you ever come up with improved version of `std::sort`, I think you should share it with the world (no jokes) — Ivan Aksamentov - Drop, Feb 03 '16 at 07:31
@Drop Could you just show me the line of code in an `std::sort()` implementation that mandates random-access iterator? — Lingxi, Feb 03 '16 at 07:38
This is where `O(N)` space is hidden. BTW your code is very nice and thanks for sharing. I don't have time for detailed analysis, but my gut feelings say that it's not `O(1)` — Ivan Aksamentov - Drop, Feb 03 '16 at 07:38
Nope, I can't. Also, which implementation do you prefer? there are zillion of them. — Ivan Aksamentov - Drop, Feb 03 '16 at 07:41
@Drop If the space for implicit stack counts, I don't think quicksort could ever be made O(1) space. Any implementation will just do. The standard made it so. There got to be a reason. And that reason should be found within any implementation. Could you remove the duplicate label, so that I can possibly get the answer I want? — Lingxi, Feb 03 '16 at 07:42
Well, I retracted my close vote, but I was not alone. Let's see where it will take us. — Ivan Aksamentov - Drop, Feb 03 '16 at 07:45
Have you actually tried reading any `std::sort` implementation yourself? It is non-trivial mess (not that the algorithm itself was complicated). And it is not guaranteed to be a quicksort. — Ivan Aksamentov - Drop, Feb 03 '16 at 07:48
@Drop I've read some. But I don't think it's mandatory or necessary to keep its time and space complexity. I'm preparing for an interview actually. And if the interviewer asks me this question. I will have no answer :-( — Lingxi, Feb 03 '16 at 07:54
@Lingxi according to [wikipedia](https://en.wikipedia.org/wiki/Quicksort), the recursion stack is O(log n), when one of the recursions is tail-call optimized (although, I'm not convinced that the standard implementations use recursion). — eerorika, Feb 03 '16 at 07:57
It seems like the memory would have to be at least O(log n) any way you go, but I'm not sure how C++ space requirements are defined. Will it matter that your memory is in the heap rather than the call stack? — Linus, Feb 03 '16 at 08:00
@Linus I don't think so. If it really matters, I can just make the implementation recursive. — Lingxi, Feb 03 '16 at 08:02
As I understand it, random access (moving in constant time) is required when you go to the pivot point. Bidir-iterator will degrade performance here. Also, quicksort requires random shuffling to avoid worst case. — Ivan Aksamentov - Drop, Feb 03 '16 at 08:02
You should really try this kind of questions on CS site, they love it there. — Ivan Aksamentov - Drop, Feb 03 '16 at 08:04
I think @rici pretty much nailed it in his answer. But out of curiosity: Have you actually found a usecase, where you'd like to apply a iterator-based sorting function on a range defined by bi-directional iterators ? — MikeMB, Feb 03 '16 at 08:14
@Lingxi: `std::sort` isn't quicksort and `std::list::sort` has the advantage that it can swap around the internal pointers instead of the elements, which is usually preferable. In cases where it is not the case it is probably faster to copy the elements into an array, sort it and copy them back — MikeMB, Feb 03 '16 at 08:37
@user2079303: all standard library implementations I looked at do recursion on the short of the two branches and iteration on the other. — Dietmar Kühl, Feb 03 '16 at 13:56
@DietmarKühl thanks for looking it up. now I'm way more convinced. — eerorika, Feb 03 '16 at 14:16

rici · Accepted Answer · 2016-02-03T19:58:51.960

There are a few reasons why practical library sort functions need random access iterators.

The most obvious one is the well-known fact that choosing an endpoint of the partition for a pivot reduces quicksort to O(n²) if the data is sorted (or "mostly sorted"), so most real-life quicksort's actually use a more robust algorithm. I think the most common one is the Wirth algorithm: choose the median of the first, middle, and last element of the partition, which is robust against sorted vectors. (As Dieter Kühl points out, just selecting the middle element would work almost as well, but there is practically no extra cost for the median-of-three algorithm.) Selecting a random element would also be a good strategy, since it is harder to game, but the requirement for a PRNG might be discouraging. Any strategy for selecting the pivot other than taking an endpoint requires random-access iterators (or a linear scan).

Second, quicksort is sub-optimal when the partition is small (for some heuristic definition of small). When there are few enough elements, the simplified loop of an insertion sort combined with locality of reference will make that a better solution. (This doesn't affect the complexity of the overall algorithm because the threshold is a fixed size; insertion sort of a maximum of k elements is O(1) for any previously established k. I think you'll typically find values between 10 and 30.) The insertion sort can be done with bidirectional iterators, but figuring out if the partition is smaller than the threshold cannot (again, unless you use an unnecessarily slow loop).

Third and possibly most importantly, quicksort can degenerate into O(n²) no matter how hard you try. Earlier C++ standards accepted that std::sort might be "O(n log n) on the average", but since the acceptance of DR713 the standard requires std::sort to be O(n log n) without qualifications. This cannot be achieved with a pure quicksort, so modern library sort algorithms are actually based on introsort or similar. This algorithm falls back to a different sorting algorithm -- normally heapsort -- if it detects that the partitioning is too biased. The fallback algorithm is very likely to require random access iterators (heapsort and shellsort both do, for example).

Finally, recursion depth can be reduced to a maximum of log₂n by using the simple strategy of recursing on the smallest partition and tail-recurring (explicitly looping) on the larger partition. Since recursion is generally faster than explicitly maintaining a stack, and recursion is totally reasonable if the maximum recursion depth is in the low two digits, this little optimization is worthwhile (although not all library implementations use it.) Again, that requires being able to compute the size of the partitions.

There are probably other aspects of practical sortation which require random access iterators; those are just off the top of my head.

I guess the median-of-three strategy could be implemented with `std::advance()` without worsening the time complexity, though it does increase the time constant. But it's another matter. — Lingxi, Feb 03 '16 at 08:06
@lingxi: practical as opposed to theoretical sorts are not going to count elements one at a time just in order to avoid using a random access iterator which almost certainly exists :) I tried hard to avoid making judgemental statements about academic exercises in standard library implementations, but I'm sure you can guess the thought patterns. — rici, Feb 03 '16 at 08:07
So, basically, to have a lower time constant is the answer? I really doubt it. — Lingxi, Feb 03 '16 at 08:10
@lingxi: sort functions are judged by how fast they are. Believe it or not. Also, since about C++11, std::sort is required to be O(n log n), and not just stochastic O(n log n) precisely because introsort was judged to be requireable. So pure quicksort cannot be used as a conformant implementation of std::sort. I'll add that to the answer in the morning, along with whatever other nits show up overnight :) — rici, Feb 03 '16 at 08:10
Up voted. I want to keep the question open for a little longer, and hopefully get an absolute answer. — Lingxi, Feb 03 '16 at 08:20

Dietmar Kühl · Answer 2 · 2016-02-03T09:40:33.700

The simple answer is that quicksort is slow unless optimized in particular for small ranges. To detect ranges are small, an efficient way to determine their size is needed.

I have a presentation (here are the slides and the code) where I show the steps used to create a fast implementation of quicksort. It turns out that the sort implementation is actually a hybrid algorithm.

The essential steps in making quicksort fast are the following:

Guard against [mostly] sorted sequences. One of the interesting cases here is actually the special sorted sequence consisting of all identical elements: in real data equal subsequences are not rare at all. The approach to do so is monitoring when quicksort does too much work and switching to an algorithm with known complexity like heapsort or mergesort to finish sorting the problematic subsequence. This approach goes under the name introsort.
Quicksort is really bad on short sequences. Since quicksort is a divide and conquer algorithm it actually produces many small sequences. Dealing with small sequences can, e.g., be done using insertionsort. To find out whether a sequence is small it is necessary to check the size of sequences efficiently. This is where the need for random access comes in.
There are a number of additional optimization which are necessary to make quicksort really fast although their impact is overall smaller than the impact of the approaches above. For example:
- The used partition needs to take advantages of sentinels to reduce the number of comparisons.
- Observing whether the partition does any work can lead to an early bail-out by gambling on running an insertionsort which stops when doing too much work.
- To increase the chances of the previous point using the midpoint rather than either end of the sequence to be sorted as pivot has an advantage (this also requires random access but is a comparatively small reason).

I haven't done the experiment but implementing these necessary optimizations for bidirectional iterators is probably not really effective: the cost of determining whether a sequence is small (which doesn't need to get the size of the sequence but can stop as soon as it is clear that the sequence is not small) probably becomes to high. If quicksort becomes impeded to run just about 20% slower, using a different sort algorithm is preferable: using, e.g., mergesort is roughly in that range and can has the advantage that it can also be stable.

BTW, the fabled choice of the median as a pivot doesn't seem to have any interesting impact: using the middle value rather than the median seems to be roughly as good (but it is, indeed a better choice than either end).

I believe that pivot choice makes little difference on random data but a huge difference on mostly sorted data, and it is not uncommon for an application to be given mostly sorted data. — rici, Feb 03 '16 at 08:35
@rici: nope. Choice of the pivot does not matter. You need to guard against hostile orders differently (using intro-sort) in whch case choosing a pivot is unnecessary complexity and/or waste of time. — Dietmar Kühl, Feb 03 '16 at 08:45
Ah, I misread you. Yes, there is little difference between median-of-3 and middle but the cost of median-of-3 is negligible because the comparisons needed replace those of partition, a point made by Wirth iirc. Anyway, choosing the middle required random access, too. — rici, Feb 03 '16 at 15:24

Implement quicksort on bi-directional iterators

2 Answers2