21

In my application, I need to sort large arrays (between 100,000 and 1,000,000) of random numbers.

I've been using the built in array.sort(comparisonFunction) where comparisonFunction looks like this:

function comparisonFunction(a,b) {
    return a-b;
}

This works just fine, but I've read (e.g., Native JavaScript sort performing slower than implemented mergesort and quicksort) that there are faster options, especially if your requirements meet certain conditions:

  1. I only need to sort numbers (e.g., not objects, or alphanumeric data)
  2. The data is random (no chance that it's already ordered)
  3. The sort doesn't need to be stable

So - what is the fastest (or close enough) sort algorithm available under those circumstances?

And, is there a canonical (or at least a relatively ideal) JavaScript implementation?

[UPDATE]

So, a quick clarification - in the linked question, the OP required a stable sort. Since I don't - I'm wondering if that would change the answer (i.e., perhaps there's a faster sort option available if you know in advance that your data will not be pre-sorted, and you don't need a stable sort).

Perhaps the answer is "no", but that's why I'm asking.

[UPDATE #2]

Here's an implementation of quicksort that, unless I've made a mistake - beats the native sort function handily:

function comparisonFunction(a, b) {
  return a - b;
}

function quickSort(arr, leftPos, rightPos, arrLength) {
  let initialLeftPos = leftPos;
  let initialRightPos = rightPos;
  let direction = true;
  let pivot = rightPos;
  while ((leftPos - rightPos) < 0) {
    if (direction) {
      if (arr[pivot] < arr[leftPos]) {
        quickSort.swap(arr, pivot, leftPos);
        pivot = leftPos;
        rightPos--;
        direction = !direction;
      } else
        leftPos++;
    } else {
      if (arr[pivot] <= arr[rightPos]) {
        rightPos--;
      } else {
        quickSort.swap(arr, pivot, rightPos);
        leftPos++;
        pivot = rightPos;
        direction = !direction;
      }
    }
  }
  if (pivot - 1 > initialLeftPos) {
    quickSort(arr, initialLeftPos, pivot - 1, arrLength);
  }
  if (pivot + 1 < initialRightPos) {
    quickSort(arr, pivot + 1, initialRightPos, arrLength);
  }
}
quickSort.swap = (arr, el1, el2) => {
  let swapedElem = arr[el1];
  arr[el1] = arr[el2];
  arr[el2] = swapedElem;
}

var
  i,
  arr1, arr2,
  length;

length = 1000000;


arr1 = [];
arr2 = [];
for (i = 0; i < length; i++) {
  arr1.push(Math.random());
  arr2.push(Math.random());
}

console.time("nativeSort");
arr1.sort(comparisonFunction);
console.timeEnd("nativeSort");


console.time("quickSort");
quickSort(arr2, 0, length - 1, length);
console.timeEnd("quickSort");
E_net4
  • 27,810
  • 13
  • 101
  • 139
mattstuehler
  • 9,040
  • 18
  • 78
  • 108
  • 7
    *"...I've read that there are much faster options..."* Where? – T.J. Crowder Nov 21 '16 at 13:50
  • 2
    Note that the native JavaScript `.sort()` is not required to be stable. I would be surprised if a JavaScript sort were faster than the native sort for an array of any significant size, though with certain input constraints a radix sort might be worth a try. – Pointy Nov 21 '16 at 13:51
  • 1
    @Liam: Except that the above clearly says it *doesn't* need a stable sort. (Mind you, if you're sorting numbers, stable vs. unstable is a distinction without a difference...) – T.J. Crowder Nov 21 '16 at 13:51
  • I mean really, this is a pretty silly question. I just figured a dupe was a better close reason than too broad, etc. – Liam Nov 21 '16 at 13:53
  • 1
    @T.J.Crowder Here's a simple Fiddle I put together that sorts 1,000,000 randomly generated numbers using both array.sort() and a quicksort implementation - https://jsfiddle.net/5r4mjtx4/. Unless I'm mistaken - the quicksort is much faster - about twice as fast with 1m numbers, and 4x faster with 10m numbers. – mattstuehler Nov 21 '16 at 14:19
  • 1
    @T.J.Crowder In response to your question: See this SO post: http://stackoverflow.com/questions/38732480/native-javascript-sort-performing-slower-than-implemented-mergesort-and-quicksor – mattstuehler Nov 21 '16 at 14:22
  • @mattstuehler: Color me surprised! :-) georg's answer demonstrates ones as well. – T.J. Crowder Nov 21 '16 at 14:38
  • I've done a little bit of research on the question - found a quicksort implementation that seemed to beat the native sort function. Also found that node-timsort function, although the source code (~1,000 lines) seems like it might be overkill. So I figured I'd ask the experts on this site if there was a consensus ***best*** algorithm, given these common requirements. Also - while the quicksort implementation I included seems to work well - I was hoping that the experts could weigh in on the ***best*** implementation - seems like it'd be nice to have that here on SO. – mattstuehler Nov 21 '16 at 14:45
  • @Liam - I'm not sure this is a duplicate. The question you linked to is different - that person required a stable sort. I'm asking if there's a faster algorithm available in the specific case when *I don't need a stable sort*, and I don't have to worry about the edge-case where my data might be sorted already. – mattstuehler Nov 21 '16 at 14:53
  • 1
    There's an explanation for why the native sort is slow and some faster alternative sort functions here [native javascrtipt sort alternatives](http://stackoverflow.com/questions/38732480/native-javascript-sort-performing-slower-than-implemented-mergesort-and-quicksor/38733191#38733191) – rcgldr Nov 21 '16 at 17:52
  • @rcgldr - I'm sure you're right. I'm not knocking the native sort function. In my limited research - the common sort algorithms have strengths and weaknesses; some work better given certain assumptions, and others work better in other circumstances. So, the native sort has probably been designed to work best under the widest variety of conditions. That's why I thought my question was valid - I'm looking for the optimal algorithm given my specific circumstances - random numeric data, where stability is not required. – mattstuehler Nov 21 '16 at 17:57
  • 1
    @mattstuehler - In this case, the native sort chose to avoid worst case quicksort performance with a relatively slow function to choose pivot via a function named GetThirdIndex() (go to the link in my prior comment for a detailed explanation), which slows down the core algorithm in all cases. It would have been much faster to use [introsort](http://en.wikipedia.org/wiki/Introsort). – rcgldr Nov 21 '16 at 18:02
  • @rcgldr - Introsort is a new one to me! I'm not qualified to evaluate this implementation (https://github.com/arnorhs/js-introsort), but I tried it, and it's faster than the native sort ant timsort implementation (linked below), and comparable to quicksort. But, it seems better than quick sort in that, when the data has many duplicate values, quick sort will cause problems with too many levels of recursion. – mattstuehler Nov 21 '16 at 18:30
  • 2
    @mattstuehler - I can post a C++ example of quicksort that ends up running faster as the number of duplicates increase. After doing a Hoare like partition, there are two 2 line loops used to exclude all middle values == pivot value. Almost no impact in the average case, and O(n) running time if all values are the same. You could easily modify an existing javascript quicksort or introsort to follow the same logic. – rcgldr Nov 21 '16 at 18:36
  • @rcgldr - thank you! I'd love to see that C++ example. If you'd like to post it below as an answer, I'd gratefully mark it as the accepted answer. – mattstuehler Nov 21 '16 at 18:40
  • 2
    Your quicksort implementation is flawed. You will get call stack overflow (maximum call stack exceeded error) if you have too many repeating items in the input array such as filling the 1M size input array with random integers among 1-100. I would recommend you to switch to while loops instead of recursive calls. Or.. i would say implement tail call optimized recursive functions alas as of today the TCO is a joke in JS engines. It doesn't work at all. – Redu Feb 17 '17 at 10:37
  • @Redu - I'd noticed that problem too, but I'm not sure the best way to fix it. (For my present project, I'm adding a tiny bit of random "noise" to my elements to make sure that all values are more unique, but that's a silly hack.) I'm hoping an algorithm expert will come along and improve the function! – mattstuehler Feb 17 '17 at 14:05
  • If you want the "best sort" you need to mind your sorting requirements and your implementation. Search google for the following sorts and learn them: insertion sort, selection sort, quick sort, merge sort, heap sort, intro sort, shell sort, counting sort, radix sort, bucket sort, bubble sort, and once you've seen all of those there's a small chance you'll understand the utterly magnificent `bogosort`. . [Bonus Video](https://www.youtube.com/watch?v=kPRA0W1kECg) – Jacksonkr Sep 21 '17 at 15:26
  • 1
    @mattstuehler As others have mentioned, your quicksort has a much worse "worst case" sort time than the native implementation. Try feeding your function a sorted array. For 5,000 integers I got 11ms on native, and 77ms with yours. The best quicksort implementations actually randomize arrays before using them to avoid this. – Robert Caldwell Apr 09 '18 at 22:50
  • An answer on [How to sort an array of integers correctly](//stackoverflow.com/a/53355543) suggests a TypedArray of `Uint32Array` as being 5x faster than standard `sort()` with a comparison function. Other types can be used depending on your data. – Peter Cordes Nov 20 '19 at 05:14

3 Answers3

15

There are sort implementations that consistently beat the stock .sort (V8 at least), node-timsort being one of them. Example:

var SIZE = 1 << 20;

var a = [], b = [];

for(var i = 0; i < SIZE; i++) {
    var r = (Math.random() * 10000) >>> 0;
    a.push(r);
    b.push(r);
}

console.log(navigator.userAgent);

console.time("timsort");
timsort.sort(a, (x, y) => x - y);
console.timeEnd("timsort");

console.time("Array#sort");
b.sort((x, y) => x - y);
console.timeEnd("Array#sort");
<script src="https://rawgithub.com/mziccard/node-timsort/master/build/timsort.js"></script>

Here are some timings from different browsers I have around (Chakra anyone?):

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.113 Safari/537.36
timsort: 256.120ms
Array#sort: 341.595ms

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14
timsort: 189.795ms
Array#sort: 245.725ms

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0
timsort: 402.230ms
Array#sort: 187.900ms

So, the FF engine is very different from Chrome/Safari.

georg
  • 211,518
  • 52
  • 313
  • 390
  • I'm using iceweasel and I get 227.418ms for timsort and 172.212ms for Array#Sort – acontell Nov 21 '16 at 14:43
  • 3
    @acontell: yep, strongly depends on the engine, see the update. – georg Nov 21 '16 at 17:26
  • 1
    I put together this Fiddle (https://jsfiddle.net/uqq54ho8/2/), which compares the native sort with that timsort and the quick sort I posted above. For 5,000,000 random numbers, the results were: *timsort: 1050ms native: 2536ms quick: 754ms*. [Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36] – mattstuehler Nov 21 '16 at 18:09
  • Chrome's internal sort is actually very buggy and slow, this was reported years ago without a fix. This from the company that prides itself on making candidates regurgitate sort algorithms during interviews. – Dirigible Sep 17 '18 at 04:05
7

The fastest way to sort an array of numbers is to convert it to a TypedArray then call its sort function. The sort functions on typed arrays are numeric by default and much faster than Array.sort() with a comparison function.

const a = Array.from({length: 1<<20}, Math.random)
const b = Array.from(a)

function comparisonFunction(a,b){ return a-b }

console.time("nativeSort")
a.sort(comparisonFunction)
console.timeEnd("nativeSort")

console.time("typedSort")
let typedArray = Float32Array.from(b)
typedArray.sort()
console.timeEnd("typedSort")
rotu
  • 103
  • 1
  • 4
6

No need to mark this as an answer, since it's not javascript, and doesn't have introsort's depth check to switch to heapsort.

Example C++ quicksort. It uses median of 3 to choose pivot value, Hoare partition scheme, then excludes middle values == pivot (which may or may not improve time, as values == pivot can end up anywhere during partition step), and only uses recursion on the smaller partition, looping back on the larger partition to limit stack complexity to O(log2(n)) worst case. The worst case time complexity is still O(n^2), but this would require median of 3 to repeatedly choose small or large values, an unusual pattern. Sorted, or reverse sorted arrays are not an issue. If all values are the same, then time complexity is O(n). Adding a depth check to switch to heapsort (making this an introsort) would limit time complexity to O(n log(n)), but with a higher constant factor depending on how much heapsort path is used.

void QuickSort(uint32_t a[], size_t lo, size_t hi) {
    while(lo < hi){
        size_t i = lo, j = (lo+hi)/2, k = hi;
        uint32_t p;
        if (a[k] < a[i])            // median of 3
            std::swap(a[k], a[i]);
        if (a[j] < a[i])
            std::swap(a[j], a[i]);
        if (a[k] < a[j])
            std::swap(a[k], a[j]);
        p = a[j];
        i--;                        // Hoare partition
        k++;
        while (1) {
            while (a[++i] < p);
            while (a[--k] > p);
            if (i >= k)
                break;
            std::swap(a[i], a[k]);
        }
        i = k++;
        while(i > lo && a[i] == p)  // exclude middle values == pivot
            i--;
        while(k < hi && a[k] == p)
            k++;
        // recurse on smaller part, loop on larger part
        if((i - lo) <= (hi - k)){
            QuickSort(a, lo, i);
            lo = k;
        } else {
            QuickSort(a, k, hi);
            hi = i;
        }
    }
}

If space isn't an issue, then the merge sorts here may be better:

Native JavaScript sort performing slower than implemented mergesort and quicksort

If just sorting numbers, and again assuming space isn't an issue, radix sort would be fastest.

rcgldr
  • 27,407
  • 3
  • 36
  • 61