5

Given N sorted arrays of integers (no duplicates), I'd like to calculate the first limit integers in their intersection.

For example, given the following arrays:

[2, 5, 7, 8, 10, 12, 13, 15, 20, 24]
[3, 4, 5, 6, 9, 10, 11, 17, 20]
[1, 2, 3, 5, 6, 10, 12, 20, 23, 29]

the intersection is [5, 10, 20], so if limit = 2, the result should be [5, 10].

The given arrays should not be mutated.

My attempt is below. Playground here.

Is there a more efficient (faster) way to achieve this?

Would appreciate a jsperf comparison.


function intersection(sortedArrays, limit) {
  var arraysCount = sortedArrays.length;
  var indices = sortedArrays.map(function(array) { return 0; });
  var values, maxValue, valuesAreSame, reachedEnd, i, result = [];

  while (true) {
    reachedEnd = indices.some(function(index, i) {
      return index === sortedArrays[i].length;
    });

    if (reachedEnd) {
      return result;
    }

    values = sortedArrays.map(function(array, i) { return array[indices[i]]; });
    valuesAreSame = values.every(function(value, i) { return value === values[0]; });

    if (valuesAreSame) {
      result[result.length] = values[0];

      if (result.length === limit) {
        return result;
      }

      for (i = 0; i < arraysCount; i++) {
        indices[i]++;
      }
    } else {
      maxValue = Math.max.apply(null, values);

      for (i = 0; i < arraysCount; i++) {
        if (values[i] < maxValue) {
          indices[i]++;
        }
      }  
    }  
  }
}

console.log(intersection([[0, 3, 8, 11], [1, 3, 11, 15]], 1));
// => [3]
Misha Moroshko
  • 166,356
  • 226
  • 505
  • 746
  • 1
    Yes, it's called an N-way merge (or k-way merge) and simply stop merging after size reaches the LIMIT. [Here is a link](http://stackoverflow.com/questions/5055909/algorithm-for-n-way-merge) to an answer with a rather good overview as to how to implement it. Although, it uses files instead of arrays, the principle is the same. – Nuclearman May 10 '15 at 08:20

3 Answers3

3

The first challenge is to make the function correct. Once it's correct, we can worry about the speed.

There are a few things which could trip-up a function like this:

  • NaN
  • Bad limits
  • Repeated numbers
  • Only 1 input array (or none at all)

Your original function can handle repeated numbers, such as [[9,9,9,9],[9,9,9]], but gets stuck in an infinite loop if any value is NaN, and handles a limit of 0 as if there were no limit at all.

Here's my (Mk3) attempt:

function intersection( arrs, limit ) {
    var result = [], posns = [];
    var j, v, next, n = arrs.length, count = 1;
    if( !n || limit <= 0 ) {
        return result; // nothing to do
    }
    if( n === 1 ) {
        // special case needed because main loop cannot handle this
        for( j = 0; j < arrs[0].length && result.length < limit; ++ j ) {
            v = arrs[0][j];
            if( v === v ) {
                result.push( v );
            }
        }
        return result;
    }
    for( j = 0; j < n; ++ j ) {
        if( !arrs[j].length ) {
            return result; // no intersection
        }
        posns[j] = 0;
    }
    next = arrs[n-1][0];
    ++ posns[n-1];
    while( true ) {
        for( j = 0; j < n; ++ j ) {
            do {
                if( posns[j] >= arrs[j].length ) {
                    return result; // ran out of values
                }
                v = arrs[j][posns[j]++];
            } while( v < next || v !== v );

            if( v !== next ) {
                count = 1;
                next = v;
            } else if( (++ count) >= n ) {
                result.push( next );
                if( result.length >= limit ) {
                    return result; // limit reached
                }
                if( posns[j] >= arrs[j].length ) {
                    return result; // ran out of values
                }
                next = arrs[j][posns[j]++];
                count = 1;
            }
        }
    }
}

(fiddle: http://jsfiddle.net/kn2wz2sc/4/)

This works in much the same way as your original method, but with several optimisations. It always knows which number it is looking for next, and will quickly iterate through each array until it finds a number which is at least that big. If the number is too big, it will update the number it is looking for.

In Mk2 I took some inspiration from Casey's method of counting matches as it goes instead of checking from 0-n each time, which allows it to short-cut some comparisons (and since Casey is now using indices, both methods have become very similar). In Mk3 I've made some more micro-optimisations, incrementing the indexes eagerly so that it doesn't need an inner loop.

This is safe against all the criteria I listed above (it ignores NaN since NaN!=NaN and therefore will never be in the intersection), isn't limited to numbers, and will exit quickly once any limit is reached.


A jsperf shows that Mk3 is the fastest method so far: http://jsperf.com/sorted-intersect/5 (and it's still safe against duplicates and NaN).

Dave
  • 44,275
  • 12
  • 65
  • 105
  • Thanks for your answer! I forgot to mention that arrays do not have duplicates (question updated). Not sure if this can speed things up even more? – Misha Moroshko May 10 '15 at 09:17
  • My method mutates the arrays, so it's actually just running the test on empty arrays after the first time. :) – Casey Chu May 10 '15 at 09:25
  • @MishaMoroshko this method is inherently duplicate-safe, so there's no obvious way to speed it up using that information. There are some tiny changes which are possible if you know the list will never have NaN, but they're so minor that I'd suggest keeping the method in its most robust form. The latest (Mk3) version is significantly faster than the other methods anyway, even with all its safety. – Dave May 10 '15 at 11:33
  • @Dave Can I ask what `if( v === v ) {` does in `intersection3c()`? Isn't it always `true`? Also `v !== v` here: `while( v < next || v !== v );` – Misha Moroshko May 10 '15 at 12:02
  • Those are both checks to cater for NaN values. NaN!=NaN, so it filters them out. The first is so that the behaviour with a single input is consistent, the second protects you from infinite loops. In earlier versions I also had constructs like if(!(v<=next)) for the same reason. When dealing with orderings, it's a good idea to consider how it will fail when dealing with NaN (especially if an infinite loop is a possible outcome!) – Dave May 10 '15 at 13:40
2

Here's another algorithm, where the idea is that we count how many times we see each number. Once we see it arrs.length times, we know that it's in the intersection. If it's missing from even one list, it's not in the intersection, and we can skip to the next number in that list. It turns out to be a lot faster!

This method mutates the array, but is easier to read.

function intersection(arrs, limit) {
    var intersections = [];

    // Keep track of how many times we've seen the largest element seen so far.
    var largest = -Infinity;
    var count = 0;

    while (intersections.length < limit) {
        for (var i = 0; i < arrs.length; i++) {

            // Drop elements less than `largest`.
            while (arrs[i].length && arrs[i][0] < largest)
                arrs[i].shift();

            // Ignore repeated elements (not needed if you don't have repeated elements).
            while (arrs[i].length >= 2 && arrs[i][0] == largest && arrs[i][1] == largest)
                arrs[i].shift();

            // If we ran out of elements, we're done.
            if (!arrs[i].length)
                return intersections;

            // Look at the next element.
            var next = arrs[i].shift();
            if (next == largest)
                count++;
            else {
                count = 1;
                largest = next;
            }

            // Once we see it enough times, we can be sure it's in the intersection!
            if (count == arrs.length)
                intersections.push(largest);
        }
    }

    return intersections;
}

This method doesn't, but it's harder to read.

function intersection(arrs, limit) {
    var intersections = [];
    var indices = [];
    for (var i = 0; i < arrs.length; i++)
        indices[i] = 0;

    // Keep track of how many times we've seen the largest element seen so far.
    var largest = -Infinity;
    var count = 0;

    while (intersections.length < limit) {
        for (var i = 0; i < arrs.length; i++) {

            // Skip past elements less than `largest`.
            while (indices[i] < arrs[i].length && arrs[i][indices[i]] < largest)
                indices[i]++;

            // If we ran out of elements, we're done.
            if (indices[i] >= arrs[i].length)
                return intersections;

            // Look at the next element.
            var next = arrs[i][indices[i]++];
            if (next == largest)
                count++;
            else {
                count = 1;
                largest = next;
            }

            // Once we see it enough times, we can be sure it's in the intersection!
            if (count == arrs.length)
                intersections.push(largest);
        }
    }

    return intersections;
}
Casey Chu
  • 25,069
  • 10
  • 40
  • 59
  • This has impressive speed, but it does not handle NaN consistently: ([[0, 3, 4, 0/0, 11], [1, 3, 11, 15]], 8) will produce "3", whereas ([[0, 3, 0/0, 11], [1, 3, 11, 15]], 8) will produce "3,11" – Dave May 10 '15 at 09:24
  • @Casey Nice, thanks! I forgot to mention that the arrays should not be mutated, so if you copy them, it should be considered part of the algorithm time. Do you mind to update jsperf accordingly? – Misha Moroshko May 10 '15 at 09:24
  • Updated to include a version that doesn't mutate the elements. As for the NaN issue, they're known to be sorted integer arrays so they shouldn't have NaNs. – Casey Chu May 10 '15 at 09:48
1

Faster (but by a long shot not as fast as the other answers):

function intersectMultiple(sortedArrays, limit) {
    var set = {}, result = [],
        a = sortedArrays.length,
        l = Math.max.apply(null, sortedArrays.map(function (a) {
            return a.length;
        })), i, j, c = 0, val;

    for (i = 0; i < l && c < limit; i++) {
        for (j = 0; j < a && c < limit; j++) {
            val = sortedArrays[j][i];
            if (!set.hasOwnProperty(val)) set[val] = 0;
            if (++set[val] === a) result[c++] = val;
        }
    };
    return result;
}

and

var s = [
    [2, 5, 7, 8, 10, 12, 13, 15, 20, 24],
    [3, 4, 5, 6, 9, 10, 11, 17, 20],
    [1, 2, 3, 5, 6, 10, 12, 20, 23, 29]
];

intersectMultiple(s, 2);

// [5, 10]

http://jsperf.com/intersect-multiple

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • The limits you are applying are too strict: [2, 5, 7, 8, 10, 12, 13, 15, 20, 24,29],[3, 4, 5, 6, 9, 10, 11, 17, 20,29],[1, 2, 3, 5, 6, 10, 12, 20, 23, 29] does not return 29 – Dave May 10 '15 at 08:47