3

I'm building a tool to join multiple streams by the nearest timestamp. The streams may not be in sync so I store the last n (probably >=500) items in a fixed size circular buffer. I'd like to use a sortedIndex (not search) to find where an item would be place in the buffer. I need this index to find the stream item just before and just after the time stamp.

The handling of edge cases of around the corner issues isn't important, I don't care if you return an index outside the array or 0 for the largest value. I was playing around with implementing this last night and couldn't figure out a working implementation.

Function contract is below and is based on _.sortedIndex (implementation)

/**
 * Binary search for finding the closest computed (by iterator) value
 * in some sorted circular array
 *
 * @param {Array} array a circular array-like
 * @param {Object} value to search for an index
 * @param {Function} iterator to compute the compare value of an item
 */
function sortedIndex(array, value, iterator) {
    var low = 0,
        high = array.length;
    while (low != high && iterator(array[low]) > iterator(array[high])) {
        // The binary search I failed to implement
    }
    return low;
}

Some test cases (again feel free to handle the around a corner cases differently): http://jsbin.com/yusilaba/1/edit

function identity(x) {
    return x;
}

function property(prop) {
    return function(x) {
        return x[prop];
    };
}

test('sortedIndex should work on simple case', function(t) {
    var array = [1, 2, 3, 4];

    equal(sortedIndex(array, 2, identity), 1, 'equal case sorts towards left');
    equal(sortedIndex(array, 2.5, identity), 2);
    equal(sortedIndex(array, 10, identity), 0);
    equal(sortedIndex(array, -10, identity), 3);

    array = [{a: 1}, {a: 2}, {a: 3}, {a: 4}];
    equal(sortedIndex(array, {a: 2}, property('a')), 1);
    equal(sortedIndex(array, {a: 2.5}, property('a')), 2);
    equal(sortedIndex(array, {a: 10}, property('a')), 0);
    equal(sortedIndex(array, {a: -10}, property('a')), 3);
});

test('sortedIndex should work on circular collections', function() {
    var array = [2, 3, 4, 1, 1.5];

    equal(sortedIndex(array, 2, identity), 0, 'equal case sorts towards left');
    equal(sortedIndex(array, 2.5, identity), 1);
    equal(sortedIndex(array, 10, identity), 3);
    equal(sortedIndex(array, -10, identity), 2);
    equal(sortedIndex(array, 5, identity), 4);
    equal(sortedIndex(array, 3.5, identity), 3);

    array = [{a: 2}, {a: 3}, {a: 4}, {a: 1}, {a: 1.5}];
    equal(sortedIndex(array, {a: 2}, property('a')), 0, 'equal case sorts towards left');
    equal(sortedIndex(array, {a: 2.5}, property('a')), 1);
    equal(sortedIndex(array, {a: 10}, property('a')), 3);
    equal(sortedIndex(array, {a: -10}, property('a')), 2);
});

Edit --- Here's my completed version https://github.com/trevnorris/cbuffer/pull/14

sortedIndex : function(value, comparitor, context) {
    var low = this.start,
        high = this.size - 1;

     // Tricky part is finding if its before or after the pivot
     // we can get this info by checking if the target is less than
     // the last item. After that it's just a typical binary search.
     if (low && comparitor.call(context, value, this.data[high]) > 0) {
        low = 0, high = this.end;
     }

     while (low < high) {
       var mid = (low + high) >>> 1;
       if (comparitor.call(context, value, this.data[mid]) > 0) low = mid + 1;
       else high = mid;
     }
     // http://stackoverflow.com/a/18618273/1517919
     return (((low - this.start) % this.size) + this.size) % this.size;
}
megawac
  • 10,953
  • 5
  • 40
  • 61
  • 1
    Quick comment, I think you have a mistake in the test cases; property function, `x` is undefined because the inner function's argument is incorrectly named `prop`. – EWit Aug 03 '14 at 15:53
  • one more in function `sortedIndex` you are taking parameter as `iterable` and calling as `iterator`. – Mritunjay Aug 03 '14 at 15:59
  • Thanks wrote that on a train ride without testing :) – megawac Aug 03 '14 at 16:20

2 Answers2

3

Here's what I came up with (it passes your test cases). Basically, it does normal binary search when the array is sorted. When it is circular (ex: [2,3,4,1]), it finds the pivot (which is the index of where circle start, so in that example index 3, which corresponds to the 4 in the array, would be the pivot), then binary searches the part of the array the pivot is in.

 function findPivot(arr, low, high, iterable){
    // base cases
    if (high < low)  return -1;
    if (high == low) return low;

    var mid = Math.floor((low + high)/2);
    if (mid < high && iterable(arr[mid]) > iterable(arr[mid + 1]))
      return mid;
    if (mid > low && iterable(arr[mid]) < iterable(arr[mid - 1]))
      return (mid-1);
    if (iterable(arr[low]) >= iterable(arr[mid]))
      return findPivot(arr, low, mid-1, iterable);
    else
      return findPivot(arr, mid + 1, high, iterable);
 }

 function binarySearch(arr, low, high, val, iterable)
 {
    if (high < low)
        return low;
    var mid = Math.floor((low + high)/2);
    if (iterable(val) == iterable(arr[mid]))
      return mid;
    if (iterable(val) > iterable(arr[mid]))
      return binarySearch(arr, (mid + 1), high, val, iterable);
    else
      return binarySearch(arr, low, (mid -1), val, iterable);
 }


 function sortedIndex(array, value, iterable) {
     var arr_size = array.length;
    var pivot = findPivot(array, 0, arr_size-1, iterable);

    if (pivot == -1) {
      if(iterable(array[arr_size-1]) < iterable(value)){
        return 0;
      } else if(iterable(array[0]) > iterable(value)){
        return arr_size-1;
      }
      return binarySearch(array, 0, arr_size-1, value, iterable);
    }

   if(iterable(array[pivot]) < iterable(value)){
     return pivot+1;
   } else if(iterable(array[pivot+1]) > iterable(value)){
      return pivot;
   }

    if (iterable(array[pivot]) == iterable(value))
      return pivot;
    if (iterable(array[0]) <= iterable(value))
      return binarySearch(array, 0, pivot-1, value, iterable);
    else
      return binarySearch(array, pivot+1, arr_size-1, value, iterable);
    }

Here's the test cases: http://jsbin.com/ratufewa/1/edit

Hopefully this is in the right direction.

Iterative Solution http://jsbin.com/ratufewa/3/edit:

 function findPivot(arr, low, high, iterable)
 {
   while(true){
      // base cases
      if (high < low)  return -1;
      if (high == low) return low;

      var mid = (low + high) >>> 1;
      if (mid < high && iterable(arr[mid]) > iterable(arr[mid + 1]))
        return mid;
      if (mid > low && iterable(arr[mid]) < iterable(arr[mid - 1]))
        return (mid-1);
      if (iterable(arr[low]) >= iterable(arr[mid]))
        high = mid-1;
      else
        low = mid + 1;
   }
 }

 function binarySearch(arr, low, high, val, iterable)
 {
   while(true){
      if (high < low)
     return low;
      var mid = (low + high) >>> 1;
      if (iterable(val) == iterable(arr[mid]))
        return mid;
      if (iterable(val) > iterable(arr[mid]))
        low = mid + 1;
      else
        high = mid -1;
   }
 }


 function sortedIndex(array, value, iterable) {
    var arr_size = array.length;
    var pivot = findPivot(array, 0, arr_size-1, iterable);

    if (pivot == -1) {
      if(iterable(array[arr_size-1]) < iterable(value)){
        return 0;
      } else if(iterable(array[0]) > iterable(value)){
        return arr_size-1;
      }
      return binarySearch(array, 0, arr_size-1, value, iterable);
    }

   if(iterable(array[pivot]) < iterable(value)){
     return pivot+1;
   } else if(iterable(array[pivot+1]) > iterable(value)){
      return pivot;
   }

    if (iterable(array[pivot]) == iterable(value))
      return pivot;
    if (iterable(array[0]) <= iterable(value))
      return binarySearch(array, 0, pivot-1, value, iterable);
    else
      return binarySearch(array, pivot+1, arr_size-1, value, iterable);
 }
piergiaj
  • 629
  • 3
  • 9
  • What is "the pivot" of a circular array? – Bergi Aug 03 '14 at 17:45
  • The pivot is the index where it starts (kinda like in quicksort), for example in [3,4,5,1,2,3] pivot would be 2 (corresponding to the 5) – piergiaj Aug 03 '14 at 17:55
  • 1
    I see what you mean, however 5 is where it end (1 is where it starts) – Bergi Aug 03 '14 at 18:27
  • Looking good, if you can figure out an iterative `binarySearch` (like underscores) I'll award you the bounty. Also `Math.floor(x / 2)` can be written `x >>> 1` :) – megawac Aug 04 '14 at 23:36
  • updated to add an iterative binarySearch and findPivot – piergiaj Aug 04 '14 at 23:52
  • Thanks, this has given me a lot of direction -- see my edit. Still testing and playing around with the algorithm – megawac Aug 06 '14 at 19:22
1

Usually, in a circular buffer you do store the start and end of the actually occupied places. With that information, it should be rather trivial as we can distinguish three cases:

function sortedIndex(buffer, item, getValue) {
    if (buffer.start < buffer.end)
        // do standard binary search between start and end indices
    else if (getValue(buffer[0]) <= getValue(item))
        // do standard binary search between 0 and end index
    else // getValue(buffer[0] > getValue(item)
        // do standard binary search between start and buffer.length
}
Bergi
  • 630,263
  • 148
  • 957
  • 1,375