How does OpenJDK internally sort the datatypes and why ? It would be great if the specific algorithms can be mentioned
-
16You should always check the Java API first before asking such a question. API review is a necessary skill that needs to be cultivated and practiced. Then if you don't understand the entry, in your post, tell us what confuses you. – Hovercraft Full Of Eels Sep 01 '12 at 14:43
-
possible duplicate of [Java Sorting Algorithm](http://stackoverflow.com/questions/4249794/java-sorting-algorithm) – Paul Tomblin Sep 01 '12 at 14:45
-
2It's also worth noting that while the Oracle/Sun implementation referenced uses those algorithms, they're not required by the specification, and may differ in other implementations - http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html – Marc Bollinger Sep 01 '12 at 14:45
-
1Exactly, as specified in the API. – Hovercraft Full Of Eels Sep 01 '12 at 14:52
-
No Paul, this question is not the same as that one ... – Stephen C Sep 01 '12 at 15:24
-
The *best* answer would be [in this question](https://stackoverflow.com/questions/4018332/is-java-7-using-tim-sort-for-the-method-arrays-sort). I've cast a reopen to get this closed as a duplicate of that one – Antti Haapala -- Слава Україні Nov 02 '19 at 20:49
-
This question is absurdly broad in scope, and voting to reopen it seems ridiculous to me. See [What topics can I ask about here?](https://stackoverflow.com/help/on-topic) – skomisa Apr 01 '20 at 16:08
3 Answers
Beginning with version 7, Oracle's Java implementation is using Timsort for object arrays bigger than 10 elements, and Insertion sort for arrays with less than that number of elements. The same considerations apply for both Arrays.sort()
and Collections.sort()
. In older versions of Java, Merge sort was used instead of Timsort.
Other implementations of the language (other than Oracle's) might use a different sorting algorithm, as this is not mandated by the specification. Quoting Collections
' documentation:
The documentation for the polymorphic algorithms contained in this class generally includes a brief description of the implementation. Such descriptions should be regarded as implementation notes, rather than parts of the specification. Implementors should feel free to substitute other algorithms, so long as the specification itself is adhered to. (For example, the algorithm used by sort does not have to be a mergesort, but it does have to be stable.)
For sorting numeric primitives, JDK 7 uses "dual pivot quicksort".

- 62,887
- 36
- 269
- 388

- 232,561
- 37
- 312
- 386
Collections.sort()
uses a modified mergesort. Arrays.sort()
uses a variation of quicksort for the primitives and mergesort for Object
sorting.
For Java 7, read the comment from @SebastianPaaskeTørholm below

- 36,440
- 11
- 68
- 94
-
2Note, in Java 7, [Timsort](http://en.wikipedia.org/wiki/Timsort) is used instead. ([source](http://hg.openjdk.java.net/jdk7/tl/jdk/rev/bfd7abda8f79)) – Sebastian Paaske Tørholm Sep 01 '12 at 14:45
-
-
@Jeffrey: It's a different algorithm from the one used pre-J7, however. – Sebastian Paaske Tørholm Sep 01 '12 at 15:41
-
@Baz Collections.sort() internally calls Arrays.sort(); so just wondering that they should be using same algorthm? – rai.skumar Nov 27 '12 at 10:26
OK attempting to come up with the canonical list. Basically the contract is that Collections.sort
has to be a "stable" sort (i.e. equal elements won't be re-arranged), where Arrays.sort
(for native type arrays) can re-arrange them, since they're identical, so it has more freedom to use different (i.e. faster) algorithms. Rationale for wanting stable contract is given here. Also it is presumed that comparing objects (vs. natives) is "much more expensive" (it typically is) so a side-goal for Collections.sort
is to minimize number of comparisons, and be stable.
For all versions, Collections.sort
initially makes a copy of the list (to an array), modifies that, then copies the sorted elements back into the initial list, to avoid O(n^2) complexity for sorting linked lists. Guess they thought the extra copy wouldn't be too expensive since it's just copying references, not actual values (?).
In JDK 6:
Arrays of native types: tuned quicksort
* The sorting algorithm is a tuned quicksort, adapted from Jon
* L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function",
* Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November
* 1993). This algorithm offers n*log(n) performance on many data sets
* that cause other quicksorts to degrade to quadratic performance.
It was deemed that quadratic "worst case" O(n^2) behavior was not a problem for this modified quicksort.
Quicksort itself was chosen for performance.
List of Objects: modified mergesort
* The sorting algorithm is a modified mergesort (in which the merge is
* omitted if the highest element in the low sublist is less than the
* lowest element in the high sublist). This algorithm offers guaranteed
* n log(n) performance.
"It is a reasonably fast stable sort that guarantees O(n log n) performance and requires O(n) extra space."
It also defaults to an insertion sort for small arrays.
JDK 7:
Arrays of native types: dual-pivot quicksort
* ...The sorting algorithm is a Dual-Pivot Quicksort
* by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm
* offers O(n log(n)) performance on many data sets that cause other
* quicksorts to degrade to quadratic performance, and is typically
* faster than traditional (one-pivot) Quicksort implementations.
"The new algorithm reduces the average number of swaps by 20%."
There are also certain thresholds where if size is "below x" it will just do a counting sort, insertion sort, or quicksort instead of the "dual pivot quicksort." (depending on what type of primitive is being sorted) https://stackoverflow.com/a/41129231/32453
List of Objects: Timsort a kind of hybrid merge/insertion sort.
"It is a stable, adaptive, iterative mergesort that requires far fewer than n log(n) comparisons when running on partially sorted arrays, while offering performance comparable to a traditional mergesort when run on random arrays. Like all proper mergesorts timsort is stable and runs in O(n log n) time (worst case). In the worst case, timsort requires temporary storage space for n/2 object references; in the best case, it requires only a small constant amount of space. Contrast this with the current implementation, which always requires extra space for n object references, and beats n log n only on nearly sorted lists."
"On highly ordered data, this code can run up to 25 times as fast as the current implementation."
"1) Guaranteed O(n*log(n)) or less comparisons with a low constant. 2) Exactly n-1 comparisons for presorted (or revsorted) data. 3) Stable sort."
You can revert back to using LegacyMergeSort with an env. setting.
JDK 8:
Arrays of native types: dual-pivot quicksort, with some small modifications over jdk 7 (what?).
List of Objects: Timsort (same)
Parallel sort: ???
JDK 9:
Arrays of native types: dual-pivot quicksort, with at least some small modifications so if data is "mostly ordered" it will just do a modified merge sort on it.
List of Objects: Timsort (same)
Parallel sort: ???
JDK 10:
Arrays of native types: dual-pivot quicksort, some modifications have been proposed.
List of Objects: Timsort (same)
Parallel sort: ???
This is a community wiki feel free to update it and/or elaborate.

- 62,887
- 36
- 269
- 388