18

The wikipedia article for merge sort.

The wikipedia article for quick sort.

Both articles have excellent visualizations.

Both have n*log(n) complexity.

So obviously the distribution of the data will effect the speed of the sort. My guess would be that since a comparison can just as quickly compare any two values, no matter their spread, the range of data values does not matter.

More importantly one should consider the lateral distribution (x direction ) with respect to ordering (magnitude removed).

A good test case to consider would be if the test data had some level of sorting...

  • 1
    I can tell you when to use `std::sort`... always :) – avakar Oct 24 '11 at 16:29
  • whaht does std:sort implement..which algo? –  Oct 24 '11 at 16:29
  • Chris, the implementation is not specified by the standard. However, your standard library will likely use a combination of the two algorithms depending on the type and the number of elements in the sequence. – avakar Oct 24 '11 at 16:31
  • @ChrisAaker: GCC implementation of the standard library uses *introsort*, which is a variant of quicksort that will fallback to mergesort if it *feels* it will hit the worst case complexity (`O(N^2)` for quicksort) – David Rodríguez - dribeas Oct 24 '11 at 16:36
  • Users keep talking about quick sort failing or hitting worst case..but how does this happen? A simple example or test case? –  Oct 24 '11 at 17:05
  • Watch the [MIT open courseware lecture on this](http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/). Vanilla quicksort on an already sorted array is O(n^2). – Chris A. Oct 24 '11 at 17:12
  • @Chris A. - Darth Vadar claims for a reverse_sorted_array is worst case...not already sorted...any one? –  Nov 02 '11 at 00:33
  • [See the Wikipedia entry in the section "Implementation Issues -- choice of pivot"](http://en.wikipedia.org/wiki/Quicksort). I got the information from the MIT open courseware lecture I linked above anyway. Did you look at it? It's in the lecture on quicksort – Chris A. Nov 02 '11 at 17:10

6 Answers6

17

It typically depends on the data structures involved. Quick sort is typically the fastest, but it doesn't guarantee O(n*log(n)); there are degenerate cases where it becomes O(n^2). Heap sort is the usual alternative; it guarantees O(n*log(n)), regardless of the initial order, but it has a much higher constant factor. It's usually used when you need a hard upper limit to the time taken. Some more recent algorithms use quick sort, but attempt to recognize when it starts to degenerate, and switch to heap sort then. Merge sort is used when the data structure doesn't support random access, since it works with pure sequential access (forward iterators, rather than random access iterators). It's used in std::list<>::sort, for example. It's also widely used for external sorting, where random access can be very, very expensive compared to sequential access. (When sorting a file which doesn't fit into memory, you might break it into chunks which fit into memory, sort these using quicksort, writing each out to a file, then merge sort the generated files.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • 1
    fascinating....this is an interview questin...the companies file system uses primarily sequential access (with no - rewrites only appends)..and I can now undertand why they are asking this question and how it relates. –  Oct 24 '11 at 17:03
  • You are saying merge_sort is for when the data structure does not support random acesss...by this you mean that you would have to iterate to get the value you want? –  Oct 24 '11 at 17:04
  • @ChrisAaker I mean that you can't go to an arbitrary location in the data set, at least not cheaply. Merge sort was initially designed to work on mag tape. Even on a file on disk, seeking to random positions in the file is relatively expensive, compared to reading sequentially. And of course, in the standard library, forward and bidirectional iterators do not support addition; you need a random access iterator for that. (And `std::list` has bidirectional iterators, so `std::sort` doesn't work with it.) – James Kanze Oct 24 '11 at 17:41
  • This should be the accepted answer. I'd throw in some considerations about stability (quicksort usually isn't in efficient implementations) but you posted a great overview. +1 – Marco A. Sep 16 '15 at 19:53
  • Mergesorts are also a great choice if you needed to sort a linked list as well. – Cameron Sep 21 '17 at 09:34
11

Mergesort is quicker when dealing with linked lists. This is because pointers can easily be changed when merging lists. It only requires one pass (O(n)) through the list.

Quicksort's in-place algorithm requires the movement (swapping) of data. While this can be very efficient for in-memory dataset, it can be much more expensive if your dataset doesn't fit in memory. The result would be lots of I/O.

These days, there is a lot of parallelization that occurs. Parallelizing Mergesort is simpler than Quicksort (in-place). If not using the in-place algorithm, then the space complexity for quicksort is O(n) which is the same are mergesort.

So, to generalize, quicksort is probably more effective for datasets that fit in memory. For stuff that's larger, it's better to use mergesort.

The other general time to use mergesort over quicksort is if the data is very similar (that is, not close to being uniform). Quicksort relies on using a pivot. In the case where all the values are the similar, quicksort hits a worst case of O(n^2). If the values of the data are very similar, then it's more likely that a poor pivot will be chosen leading to very unbalanced partitions leading to an O(n^2) runtime. The most straightforward example is if all the values in the list are the same.

pschang
  • 2,568
  • 3
  • 28
  • 23
6

There is a real-world sorting algorithm -- called Timsort -- that does exploit the idea that data encountered in the wild is often partially sorted.

The algorithm is derived from merge sort and insertion sort, and is used in CPython, Java 7 and Android.

See the Wikipedia article for more details.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • +1 for timsort. Although it does use more memory in worst case scenario compared to quick sort (n/2 vs. log n). Why anyone would use merge sort though with timsort available isn't clear to me. – Voo Oct 24 '11 at 16:39
  • I might be able to guess that merge_sort is better for data that is pre-sorted a bit. –  Oct 24 '11 at 17:07
5

Of the two, use merge sort when you need a stable sort. You can use a modified quicksort (such as introsort) when you don't, since it tends to be faster and it uses less memory.

Plain old Quicksort as described by Hoare is quite sensitive to performance-killing special cases that make it Theta(n^2), so you normally do need a modified version. That's where the data-distribution comes in, since merge sort doesn't have bad cases. Once you start modifying quicksort you can go on with all sorts of different tweaks, and introsort is one of the more effective ones. It detects on the fly whether it's in a killer case, and if so switches to heapsort.

In fact, Hoare's most basic Quicksort fails worst for already-sorted data, and so your "good test cases" with some level of sorting will kill it to some level. That fact is for curiosity only, though, since it only takes a very small tweak to avoid that, nothing like as complicated as going all the way to introsort. So it's simplistic to even bother analyzing the version that's killed by sorted data.

In practice, in C++ you'd generally use std::stable_sort and std::sort rather than worrying too much about the exact algorithm.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • Do you have any examples of performance killing special cases? –  Oct 24 '11 at 16:42
  • I remember reading some considerations where if the user could provide specifically crafted data the quick sort would exhibit worst case performance, basically leading to a DOS attack. Although I've yet to see any situation where this would be a real problem. @ChrisAaker iirc the original version of quick sort used the first element as pivot, so that's quite simple to create. For the more complex variants (first, last, middle) it's rather impossible for it to happen by chance. – Voo Oct 24 '11 at 16:46
  • What Voo says, I think there are papers that construct killers for at least median-of-three pivot selection, and possibly even for better pivot selections than that. If you care that much about DOS attacks, though, then random pivot selection defeats all crafted input, and the probability of a bad case is so low as to be negligible for any input data big enough that you care whether you're in a bad case or not. That's "negligible" in the sense of, "you're much more likely to suffer from cosmic ray bit-flipping or the spontaneous explosion of the planet". – Steve Jessop Oct 24 '11 at 16:53
5

While Java 6 and earlier versions use merge sort as the sorting algorithms, C# uses QuickSort as the sorting algorithm.

QuickSort performs better than merge sort even though they are both O(nlogn). QuickSort's has a smaller constant than merge sort.

DarthVader
  • 52,984
  • 76
  • 209
  • 300
  • The constant is smaller...i.e. the constant that is removed in the Big O represetation? –  Oct 24 '11 at 16:44
  • Except in the rare cases when it doesn't. Merge sort and heap sort are O(n*log(n)), always. Quick sort is O(n*log(n)) best case, but O(n^2) worst case. – James Kanze Oct 24 '11 at 16:44
  • O.K numberically I understand the differnce..thanks..but what causes this worst case...in regards to the data "spread" –  Oct 24 '11 at 17:08
  • worst case if the data is sorted is reverse order. This causes a O(n**2) run time. but usually is not. and most of the cases unless it s not sorted in reverse, quick sort performs better than merge sort. – DarthVader Oct 24 '11 at 17:10
  • you meant " unless it is sorted in reverse"...bascially when the data is sorted in reverse order...quick sort falls to worst case big O(n*n). –  Oct 24 '11 at 17:18
  • yes. other than that, quick sort performs better. Look at CLRS. it has a in depth explanation of the performance of quick sort based on the parameters etc. There is actually a formal proof on this. – DarthVader Oct 24 '11 at 17:23
  • @stack.user.0 If you pick the pivot element from the middle, Quicksort is O(log n) for both sorted lists and "reverse sorted" inputs. – fredoverflow Jan 31 '12 at 20:49
1

Remember in practice, unless you have a very large data set and/or are executing the sort many many times, it probably won't matter at all. That being said, quicksort is generally considered the 'fastest' n*log(n) sorter. See this question already asked: Quick Sort Vs Merge Sort

Community
  • 1
  • 1
Peter
  • 29,498
  • 21
  • 89
  • 122