6

I have a JS application that needs to do a complicated sort of a large array and then display it. Using the built in array.sort(cb) method can take up to 1 second with my data. This is long enough for my UI to get janky.

Because the UI is only tall enough to show a subset of the sorted array on the screen with the rest below the scroll or paginated, I had an idea. What if I made an algorithm that went through the large array and quickly did a sort in such a way that the top N items were perfectly sorted, but the remaining items in the array were imperfectly sorted. Each time I ran my algorithm it would sort a little more of the array from the top down.

So I could break up my processing into chunks and have a smooth UI. For the first few seconds the array would not be perfectly sorted, but the imperfections would be below the scroll so they wouldn't be noticed.

My naive solution would be to write my own "Selection Sort" with the ability to break after N matches and resume later, but "Selection Sort" is a pretty terrible algorithm. The faster algorithms (from my understanding) have to go to completion to guarantee that the top N items are stable.

Does anyone know of an existing solution for this? Am I crazy? Any suggestions?

UPDATE

Taking the idea suggested by @moreON, I wrote a custom QuickSort that bails out once it has the required precision. The native sort took 1sec for this data. The regular QuickSort took around 250ms, which is already surprisingly better. The QuickSort that bails out after the first 100 items are sorted took a brisk 10ms, which is much much better. I can then take an additional 250ms to finish the sort but this doesn't matter so much because the user is already looking at the data. This reduces the user's experienced delay from 1sec to 10ms, which is pretty great.

//Init 1 million random integers into array
var arr1 = [];
var arr2 = [];
for(var i=0;i<1800000;i++) {
   var num = Math.floor(Math.random() * 1000000);
   arr1.push(num);
   arr2.push(num);
}
console.log(arr1);

//native sort
console.time("native sort");
arr1.sort(function(a,b) { return a-b; });
console.timeEnd("native sort"); //1sec
console.log(arr1);

//quicksort sort    Ref: https://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/
function swap(arr, a, b) {
   var temp = arr[a];
   arr[a] = arr[b];
   arr[b] = temp;
}
function cmp(a,b) {
   return (a<b);
}
function partition(items, left, right) {
   var pivot = items[Math.floor((right + left) / 2)];
   var i = left;
   var j = right;

   while (i <= j) {
      while (cmp(items[i],pivot)) i++;
      while (cmp(pivot,items[j])) j--;
      if (i <= j) {
         swap(items, i, j);
         i++;
         j--;
      }
   }
   return i;
}
function quickSort(items, left, right, max) {    
   if(max && left-1 > max) return items; //bail out early if we have enough
   if (items.length > 1) {
      var index = partition(items, left, right);
      if (left < index - 1) quickSort(items, left, index - 1, max);
      if (index < right) quickSort(items, index, right, max);
   }
   return items;
}

//sort first 100
console.time("partial Quicksort");
arr2 = quickSort(arr2,0,arr2.length-1,100);
console.timeEnd("partial Quicksort"); //10ms
console.log(arr2);

//sort remainder
console.time("finishing Quicksort");
arr2 = quickSort(arr2,100,arr2.length-1); //250ms
console.timeEnd("finishing Quicksort");    
console.log(arr2);
Jake
  • 1,135
  • 1
  • 12
  • 26
  • 1
    please add you attempt. – Nina Scholz Jun 09 '16 at 06:40
  • If your dataset is not perfectly sorted, isn't there a chance that the very first item should not be first in the array you're showing to the user? – Bryce Jun 09 '16 at 06:42
  • sorry, but I think you might be a little crazy, and I would like to stand corrected, but I don't think you can guarantee a partial sort without visiting all the items. At most you could guarantee that **of** the items you have sorted, they are perfectly sorted. You cannot speak to the remaining items until they are visited at least once. "Selection sort" will work, but its an exponential algorithm, crazy slow. I am pretty sure `array.sort` is a quick sort, you probably can't get much better for most data distributions. – Jim Jun 09 '16 at 06:43
  • Using your own sort algorithm is *probably* much faster than Array.sort: http://stackoverflow.com/questions/8082425/fastest-way-to-sort-32bit-signed-integer-arrays-in-javascript - and you can easily resume and pause sorting – le_m Jun 09 '16 at 06:48
  • 4
    Implement quicksort, with auxillary data to maintain state to pause and resume. Recurse* into the lower side of the array first. Stop when you've sorted the top k elements you wanted sorted. ----(*you will likely end up not implementing with recursion due to the need to manage to pause/resume, but the idea works) – moreON Jun 09 '16 at 07:12
  • How big is `N`, how big is `array` and how many seconds do we get to "retrieve" the next `N`? – גלעד ברקן Jun 09 '16 at 10:59
  • 3
    Sort it in a WebWorker – Ruan Mendes Jun 09 '16 at 11:56
  • How large is the array? Are you sure it's the sorting that causes the performance issues? Can you check you are not rendering all the items in the array? – Martin Gottweis Jun 09 '16 at 12:10
  • I updated my question with a possible solution based off the suggestion by @moreON – Jake Jun 09 '16 at 19:32
  • Did you try out the radix-sort implementation linked in my comment above? Nearly twice as fast your given scenario. – le_m Jun 09 '16 at 19:46

4 Answers4

2

If you were to heapify array, which I believe can be done in O(n) time (https://en.wikipedia.org/wiki/Binary_heap#Building_a_heap), you could extract each N items, in order, in O(N log n) time (n getting smaller as you extract).

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
1

Here is a cleaned up version of my solution that sorts a large array in batches so the JS thread doesn't stutter. In my example here, it takes a 1 second array.sort(cb) and turns it into five separate 100ms operations. You'll want to pick the pageSize intelligently based on your data. More pages will make the final sort take longer, fewer pages will make the batches take longer.

var BatchedQuickSort = {
  swap: function(arr, a, b) {
    var temp = arr[a];
    arr[a] = arr[b];
    arr[b] = temp;
  },
  partition: function(items, left, right, cmp) {
    var pivot = items[Math.floor((right + left) / 2)];
    var i = left;
    var j = right;

    while (i <= j) {
      while (cmp(items[i],pivot)<0) i++;
      while (cmp(pivot,items[j])<0) j--;
      if (i <= j) {
        this.swap(items, i, j);
        i++;
        j--;
      }
    }
    return i;
  },
  sort: function(items, cmp, max, left, right) { //Ref: https://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/
    if (items.length > 1) {
      left = typeof left != "number" ? 0 : left;
      right = typeof right != "number" ? items.length - 1 : right;
      var index = this.partition(items, left, right, cmp);
      if (left < index - 1) this.sort(items, cmp, max, left, index - 1);
      if (index < right && (!max || index<=max)) this.sort(items, cmp, max, index, right);
    }
    return items;
  }
}

//Example Usage
var arr = [];
for(var i=0;i<2000000;i++) arr.push(Math.floor(Math.random() * 1000000));
function myCompare(a,b) { return a-b; }

var pageSize = Math.floor(arr.length/5);
var page = 1;
var timer = window.setInterval(function() {
  arr = BatchedQuickSort.sort(arr, myCompare, pageSize*page,pageSize*(page-1));
  if(page*pageSize>=arr.length) {
    clearInterval(timer);
    console.log("Done",arr);
  }
  page++;
},1);
Jake
  • 1,135
  • 1
  • 12
  • 26
0

I think your question boils down to:

How to find top N elements in large array

which is kindof answered here: Find top N elements in an Array

This can be solved by traversing the list once and just pick the top N elements. Θ(n).

Check it out here: https://jsfiddle.net/jeeh4a8p/1/

function get_top_10(list) {
    var top10 = new Array(10).fill(0)
    for(i in list) {
        var smallest_in_top10 = Math.min.apply( Math, top10 )
        if(list[i] > smallest_in_top10) {
            top10.splice(top10.indexOf(smallest_in_top10),1)
            top10.push(list[i])
        }
    }
    return top10
}

console.log(get_top_10([1,2,3,4,5,6,7,8,9,10,11,12]))

var random_list = [];
for (var i = 0; i < 100; i++) {
    random_list.push(Math.round(Math.random() * 999999))
}

console.log(get_top_10(random_list))

function sortNumber(a,b) {
    return a - b;
}
Community
  • 1
  • 1
Martin Gottweis
  • 2,721
  • 13
  • 27
  • Your current code seems `O(n * N)` since for each element in `array`, you are traversing `N`. For 100,000 / 100 that would be 100,000 * 100 iterations to get the first top 100. (My suggestion would be 100,000 + 16 * 100 iterations.) – גלעד ברקן Jun 09 '16 at 13:32
  • You are right, I just wanted to show some code that worked. If I keep the ordered top10 list in a tree it's O(n * log(N)). Also, the heap thing looks more like O(n * log(n)). – Martin Gottweis Jun 09 '16 at 13:48
  • Heapifying is O(n) time, then extracting one element is O(log n) - so `top N` would be O(n) to build the heap + O(log n * N). – גלעד ברקן Jun 09 '16 at 14:04
  • But you can do it in O(n) using the Quick select algorithm. – Jim Mischel Jun 09 '16 at 15:15
-1

First of all, have some perspective on the performance improvement expectations. Efficient sorting algorithms are O(N * log2(N)). For N=1,000,000 items, N * log2(N) ~ N * 20. I doubt you have that many items that you're trying to render in a webpage.

If you only need to render the first 25 rows, Selection Sort will take N * 25 to order them, so it'll actually perform worse, assuming comparable constant overhead.

If you do want to experiment with this further, one algorithm I can think of is this: maintain a binary tree of PAGE_SIZE smallest items. Keep updating it with a single pass over the data, removing the largest items when smaller ones are found. Ignoring rebalancing, it'll take you N * log2(PAGE_SIZE) to populate the tree and render your first page of results.

ykaganovich
  • 14,736
  • 8
  • 59
  • 96
  • You were right. I tried Selection Sort first and it was much slower than native sort even with a very early bail out. I then tried QuickSort and it seems to work which you can see in my updated question. – Jake Jun 09 '16 at 19:35