441

I'm very curious, why stability is or is not important in sorting algorithms?

Levent Divilioglu
  • 11,198
  • 5
  • 59
  • 106
DarthVader
  • 52,984
  • 76
  • 209
  • 300
  • 3
    For parallelization purposes? eg: merge sort is stable and can be parallelized well and so is quicksort. – DarthVader Oct 05 '09 at 00:46
  • 19
    Classic QuickSort is unstable – Konstantin Spirin Oct 05 '09 at 02:00
  • 21
    stable sort algo - `IBM (Insertion, Bubble, Merge)` – roottraveller Jul 17 '17 at 09:11
  • 2
    A note for those who might misunderstood the concept like me: *The order of equal elements is guaranteed to be preserved.* means: if the elements in stable sort are considered equal, then they would follow the previous order. **It's not** what I used to think: if the elements in previous order are considered equal, then in the coming stable sort, they would follow the previous order. Though you may find the latter understanding also makes sense in many cases. – Rick Jun 27 '18 at 03:50

10 Answers10

546

A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.

Background: a "stable" sorting algorithm keeps the items with the same sorting key in order. Suppose we have a list of 5-letter words:

peach
straw
apple
spork

If we sort the list by just the first letter of each word then a stable-sort would produce:

apple
peach
straw
spork

In an unstable sort algorithm, straw or spork may be interchanged, but in a stable one, they stay in the same relative positions (that is, since straw appears before spork in the input, it also appears before spork in the output).

We could sort the list of words using this algorithm: stable sorting by column 5, then 4, then 3, then 2, then 1. In the end, it will be correctly sorted. Convince yourself of that. (by the way, that algorithm is called radix sort)

Now to answer your question, suppose we have a list of first and last names. We are asked to sort "by last name, then by first". We could first sort (stable or unstable) by the first name, then stable sort by the last name. After these sorts, the list is primarily sorted by the last name. However, where last names are the same, the first names are sorted.

You can't stack unstable sorts in the same fashion.

Sergio
  • 2,078
  • 3
  • 24
  • 41
Joey Adams
  • 41,996
  • 18
  • 86
  • 115
  • So, what would the sort be called to make the words in correct sorting order of apple peach sport straw? The stable sort gave us apple peach straw spork however st should be after sp (alphabetically correct), so the ultimate correct sort should be apple peach sport straw – user1416486 Jun 24 '12 at 01:43
  • 10
    @user1416486: We're sorting by the first letter only. With that assumption, `straw` and `spork` compare equal. Stable sort will preserve the order of input, whereas unstable sort does not make that guarantee. "Correct" depends on the application. The sort function in most programming languages lets the user supply a custom ordering function. If the user's function treats different items as equal (e.g. same first name, different last name), it helps to know if the original order will be preserved. See [OCaml's array sorting functions](http://is.gd/7gDhtD) for a real-world example. – Joey Adams Jun 24 '12 at 04:40
  • 4
    I do not understand the line _..same sorting key_ ? What do you mean by key here ? Please explain the statement _..same sorting key_ – saplingPro Nov 05 '12 at 12:54
  • 6
    @saplingPro: by "sorting key", I mean the thing you are sorting items by. So when sorting by first letter, then for each item, its "sorting key" is its first letter. – Joey Adams Oct 18 '13 at 15:30
  • @JoeyAdams I just added one paragraph to beginning of your answer . hope it is ok with you. :) – Kick Buttowski Feb 20 '15 at 21:52
  • 1
    @JoeyAdams Can you please add the information in your comment into your answer. I was about to down vote this as `spork` does come before `straw` unless you are only sorting by the first letter. To me that is not a natural way to sort strings and should be made clear. – NathanOliver Aug 22 '16 at 12:08
  • 1
    @NathanOliver: I'm not sure how to word it; mind giving it a shot? I do say "stable-sorting by the first letter", but I guess it's easy to skim over that. – Joey Adams Aug 23 '16 at 01:12
  • @JoeyAdams I gave it a tweek. I did skip right over where you said sort by the first letter. I think this might draw a little more attention to it. If you do not like it just roll it back. It is good either way, I just need to slow down on my reading. Thanks. – NathanOliver Aug 23 '16 at 11:38
  • 25
    **Example -** Say you have a list with each item having information about destination of the flight and departure time. You first sort the list based on time. We then sort it based on destination. If the second sort is __stable__ we now have all flights bound to same destination together and in increasing order of departure time. If it wasn't stable, they wouldn't be in increasing order of time. – roottraveller Jul 17 '17 at 09:16
  • Selection Sort is Unstable Algorithm too. Just to mention if someone needs it – Chinmoy Aug 20 '20 at 16:45
  • Only the in-place one. And obviously, all sorts can become unstable if you wrongly use "or equal to" operator. – Ferazhu Mar 28 '21 at 10:40
  • So one of my questions is if the situation demands can't we write a custom implementation for comparison while doing the sorting, like : if the last names are same then we can compare the first name to decide if one should come before another. **I am struggling to figure out real life use case for stable algorithms** – Arijeet Bhakat Apr 04 '22 at 06:08
  • @ArijeetBhakat The best example I can think of is: you are generating a report of people grouped by company, and your input list is already sorted by name. You'd like the people within the groups to remain sorted by name. If your programming language's "group by" operator is stable, you can simply group by company. If it's not, you have to sort by name as an additional step. – Joey Adams Apr 06 '22 at 23:53
96

A stable sorting algorithm is the one that sorts the identical elements in their same order as they appear in the input, whilst unstable sorting may not satisfy the case. - I thank my algorithm lecturer Didem Gozupek to have provided insight into algorithms.

I again needed to edit the question due to some feedback that some people don't get the logic of the presentation. It illustrates sorting w.r.t. first elements. On the other hand, you can either consider the illustration consisting of key-value pairs.

Stable Sorting Algorithms:

  • Insertion Sort
  • Merge Sort
  • Bubble Sort
  • Tim Sort
  • Counting Sort
  • Block Sort
  • Quadsort
  • Library Sort
  • Cocktail shaker Sort
  • Gnome Sort
  • Odd–even Sort

Unstable Sorting Algorithms:

  • Heap sort
  • Selection sort
  • Shell sort
  • Quick sort
  • Introsort (subject to Quicksort)
  • Tree sort
  • Cycle sort
  • Smoothsort
  • Tournament sort(subject to Heapsort)

enter image description here

InSync
  • 4,851
  • 4
  • 8
  • 30
  • 3
    Your values are not equal. You compare 9,7 and 9,8 but according to stability check you need the same values like both 9,7 or both 9,8. And than same values should be ordered in same in stable algorithms. – erhun May 21 '17 at 08:47
  • 1
    No, to check the stability your values should be same. I mean assume that you use two 9,7 and name it at node A and node B. If every sort operation order is like A,B (instead of they are equal) understand that sorting algorithm is stable(like merge sort). If A, B order is change when sort them multiple times(1. sort A,B then B,A again A,B etc.), understand that sorting algorithm is unstable(like quick sort) @snr – erhun May 23 '17 at 11:04
  • @snr [9, 6] is not present in Input Array. I think you meant [9, 8] in last array strip. – Uthman Nov 04 '17 at 21:15
  • 9
    @erhun I believe he is sorting only by the first number (the one before the comma) and using the second number just as a reference for you to see that the first 9 is different than the second 9. – Bitcoin Cash - ADA enthusiast Aug 23 '18 at 03:16
  • @erhun What defines that the elements are the same? That is exactly the ordering criterion that is used! It can be whoever you want.My criteria says that all numbers divisible by 10 are equal, be it 20 or 500 – ESCM Apr 29 '21 at 22:06
32

Sorting stability means that records with the same key retain their relative order before and after the sort.

So stability matters if, and only if, the problem you're solving requires retention of that relative order.

If you don't need stability, you can use a fast, memory-sipping algorithm from a library, like heapsort or quicksort, and forget about it.

If you need stability, it's more complicated. Stable algorithms have higher big-O CPU and/or memory usage than unstable algorithms. So when you have a large data set, you have to pick between beating up the CPU or the memory. If you're constrained on both CPU and memory, you have a problem. A good compromise stable algorithm is a binary tree sort; the Wikipedia article has a pathetically easy C++ implementation based on the STL.

You can make an unstable algorithm into a stable one by adding the original record number as the last-place key for each record.

Bob Murphy
  • 5,814
  • 2
  • 32
  • 35
  • 1
    Stable algorithms like Merge Sort have the same O(NlogN) complexity as Quicksort; the constant multiplier on the effort is bigger, though. – Jonathan Leffler Oct 05 '09 at 02:14
  • 1
    Yes, and memory usage on Merge Sort is O(N), while on Quicksort it's O(log N). The reason I mentioned Quicksort is that qsort() is a C standard library routine, so it's reaily available. – Bob Murphy Oct 05 '09 at 04:40
  • 1
    Best overall answer IMHO. the multi-key technique mentioned in others is interesting but overrated; it's simple to apply, but tends to be far slower than obvious alternatives (just use one sort with a multi-key compare; or sort by the first key then identify and sort any sublists with duplicates). The fact that stable sort produces a predictable result can be important in some apps. In particular if you have two input lists A,B which are identical except list B has an extra entry, the outputs for a stable sort will be identical except that B has that same extra entry. And +1 for last pgph. – greggo Feb 10 '13 at 01:43
  • In the last sentence, I don't understand what you mean with "last-place key for each record" - could you please explain? Very good informative comment overall :) – augenss Mar 26 '21 at 20:49
  • 2
    @augenss If two records both have key "foo", then before doing the sort, change them to something like "foo_00001" and "foo_00002". That will preserve the two keys' original order when you do the sort. Then when you're done with the sort, change both keys back to "foo". – Bob Murphy Mar 27 '21 at 00:19
24

It depends on what you do.

Imagine you've got some people records with a first and a last name field. First you sort the list by first name. If you then sort the list with a stable algorithm by last name, you'll have a list sorted by first name AND last name.

Levent Divilioglu
  • 11,198
  • 5
  • 59
  • 106
svens
  • 11,438
  • 6
  • 36
  • 55
17

There's a few reasons why stability can be important. One is that, if two records don't need to be swapped by swapping them you can cause a memory update, a page is marked dirty, and needs to be re-written to disk (or another slow medium).

Clinton Pierce
  • 12,859
  • 15
  • 62
  • 90
4

A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input unsorted array. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.

However, any given sorting algo which is not stable can be modified to be stable. There can be sorting algo specific ways to make it stable, but in general, any comparison based sorting algorithm which is not stable by nature can be modified to be stable by changing the key comparison operation so that the comparison of two keys considers position as a factor for objects with equal keys.

References: http://www.math.uic.edu/~leon/cs-mcs401-s08/handouts/stability.pdf http://en.wikipedia.org/wiki/Sorting_algorithm#Stability

roottraveller
  • 7,942
  • 7
  • 60
  • 65
4

I know there are many answers for this, but to me, this answer, by Robert Harvey, summarized it much more clearly:

A stable sort is one which preserves the original order of the input set, where the [unstable] algorithm does not distinguish between two or more items.

Source

John R Perry
  • 3,916
  • 2
  • 38
  • 62
2

Some more examples of the reason for wanting stable sorts. Databases are a common example. Take the case of a transaction data base than includes last|first name, date|time of purchase, item number, price. Say the data base is normally sorted by date|time. Then a query is made to make a sorted copy of the data base by last|first name, since a stable sort preserves the original order, even though the inquiry compare only involves last|first name, the transactions for each last|first name will be in data|time order.

A similar example is classic Excel, which limited sorts to 3 columns at a time. To sort 6 columns, a sort is done with the least significant 3 columns, followed by a sort with the most significant 3 columns.

A classic example of a stable radix sort is a card sorter, used to sort by a field of base 10 numeric columns. The cards are sorted from least significant digit to most significant digit. On each pass, a deck of cards is read and separated into 10 different bins according to the digit in that column. Then the 10 bins of cards are put back into the input hopper in order ("0" cards first, "9" cards last). Then another pass is done by the next column, until all columns are sorted. Actual card sorters have more than 10 bins since there are 12 zones on a card, a column can be blank, and there is a mis-read bin. To sort letters, 2 passes per column are needed, 1st pass for digit, 2nd pass for the 12 11 zone.

Later (1937) there were card collating (merging) machines that could merge two decks of cards by comparing fields. The input was two already sorted decks of cards, a master deck and an update deck. The collator merged the two decks into a a new mater bin and an archive bin, which was optionally used for master duplicates so that the new master bin would only have update cards in case of duplicates. This was probably the basis for the idea behind the original (bottom up) merge sort.

rcgldr
  • 27,407
  • 3
  • 36
  • 61
1

If you assume what you are sorting are just numbers and only their values identify/distinguish them (e.g. elements with same value are identicle), then the stability-issue of sorting is meaningless.

However, objects with same priority in sorting may be distinct, and sometime their relative order is meaningful information. In this case, unstable sort generates problems.

For example, you have a list of data which contains the time cost [T] of all players to clean a maze with Level [L] in a game. Suppose we need to rank the players by how fast they clean the maze. However, an additional rule applies: players who clean the maze with higher-level always have a higher rank, no matter how long the time cost is.

Of course you might try to map the paired value [T,L] to a real number [R] with some algorithm which follows the rules and then rank all players with [R] value.

However, if stable sorting is feasible, then you may simply sort the entire list by [T] (Faster players first) and then by [L]. In this case, the relative order of players (by time cost) will not be changed after you grouped them by level of maze they cleaned.

PS: of course the approach to sort twice is not the best solution to the particular problem but to explain the question of poster it should be enough.

M Ciel
  • 85
  • 7
0

Stable sort will always return same solution (permutation) on same input.

For instance [2,1,2] will be sorted using stable sort as permutation [2,1,3] (first is index 2, then index 1 then index 3 in sorted output) That mean that output is always shuffled same way. Other non stable, but still correct permutation is [2,3,1].

Quick sort is not stable sort and permutation differences among same elements depends on algorithm for picking pivot. Some implementations pick up at random and that can make quick sort yielding different permutations on same input using same algorithm.

Stable sort algorithm is necessary deterministic.

Luka Rahne
  • 10,336
  • 3
  • 34
  • 56
  • 2
    That's not what stability means. See http://en.wikipedia.org/wiki/Sorting_algorithm#Stability – Luís Oliveira Nov 22 '13 at 15:00
  • I should correct last sentence than non stable sort can output different solution even among same implementation, where any stable sort outputs same solution. – Luka Rahne Nov 22 '13 at 16:38
  • 1
    Why -1 ? Can someone point out please what is wrong here? This is not what stably sort is, but what property stable sort has. – Luka Rahne Oct 20 '16 at 15:13
  • Whether the sort is deterministic or not does not determine whether it is stable. I can write a non-stable deterministic sort algorithm by defining a different tie-breaking behavior (by subsorting non-key parts, for example). Stable sort specifically implies that the pre-sorted relative order of the elements are preserved when ties are sorted. example of an output of a stable sort: `sort([(5,3),(1,5),(3,3),(1,3)], x) => [(1,5),(1,3),(3,3),(5,3)]`. I can make a deterministic sort that always (deterministically) outputs: `[(1,3),(1,5),(3,3),(5,3)]` but this is not a stable sort. – cowbert Oct 16 '17 at 15:46
  • @cowbert It is more statement, about nice property that every stable sort has. That is no matter witch stable sort algorithm or implementation is used, every time there will be same result. It is harder to maintain such property among different non stable sort implementations. – Luka Rahne Oct 16 '17 at 18:57