0

I am creating a function that finds a value inside multiple arrays, but I'm having some performance issues since the arrays can be really long. The best performance I have found so far is with something like this:

function isValInArray(arr, value) {
    var bool = false;
    var myArr = arr;
    var myVal = value;
    var i = 0;
    var j = 0;

    for (i = 0; i < myArr.length; i++) {
        for (j = 0; j < myArr[i].length; j++){
            if (myArr[i][j] === myVal ) {
               bool = true;
            }
        }
    }
    return bool;
}

I have tried some different approaches but the performance of the previous function has been the best one so far.

Any ideas on how to make it a little faster?

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
c C
  • 3
  • 3
  • depending on the type of value, you might be able to cheat. ex: an array of arrays of numbers can be searched with toString() and "".indexOf(), much faster than nested iteration and discrete comparison. – dandavis Dec 31 '15 at 02:43
  • Thanks, can you elaborate a little more about the toString()? If I had only numbers and convert those to strings, would that be faster? – c C Dec 31 '15 at 03:21
  • @dandavis If the stringification is done on each search, this is much slower than plain old loops (100x by my calculation). Even if the stringification is done in advance, it's still 5x slower. –  Dec 31 '15 at 04:48

2 Answers2

2

Since you're only concerned about whether a value is IN the arrays, you can flatten them first, then use native functions to perform the search just once.

function isValInArray(arr, value) {
  var myArr = [].concat.apply([], arr);  // from stackoverflow.com/a/10865042/1677912
  return ( myArr.indexOf(value) >= 0 );
}

I revised the jsPerf test proposed by @torazaburo, see revision 2. The original test was fatally biased, as it searched a small array for only for a single value, and that value was the second-to-last item in the last row. (The second item a decrementing loop would find - hence the bias.) To reflect a more real-world scenario, now:

  • The sample data is larger; a configurable 10 x 100 square array. (OP indicated the arrays can be really long.)
  • Three data points are searched in each test; the first, last and middle. (This could be further expanded, but this selection covers best, worst, and average for loop-based solutions.)
  • Values are Strings instead of Numbers; while this will neutralize the effect of the search algorithms somewhat because string comparisons are slower than numeric comparisons, but as the OP did not indicate the nature of the values, it's a more realistic assumption.

This test shows that all the suggested solutions perform similarly, which reinforces @torazaburo's opinion, it is questionable whether these types of micro-optimizations are worth worrying about at all.

I recommend using the solution that is easiest for you to maintain. If performance is an actual problem for you, test your options with data that realistically reflects your situation.

In case you're wondering, this solution appears as concat in the tests.

test results

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
  • This turns out to be about twenty times slower than the plain old nested loop with early return. –  Dec 31 '15 at 04:23
  • @tor Not in a non-contrived test. See updated answer. – Mogsdad Dec 31 '15 at 14:34
  • This still has the same asymptotic cost. Maybe using a single call of the native `indexOf` can reduce the constant factor on some implementations. But that shouldn't be comparable to the gain provided by ES6 sets. – Oriol Dec 31 '15 at 15:59
1

You can return true immediately when you find a match:

function isValInArray(arr, value) {
  for (var i = 0; i < arr.length; ++i)
    for (var j = 0; j < arr[i].length; ++j)
      if (arr[i][j] === value)
        return true;
  return false;
}

In ECMAScript5 it can be rewritten to the more semantic (but probably slower)

function isValInArray(arr, value) {
  return arr.some(function(sub) {
    return sub.indexOf(value) > -1;
  });
}

However, asymptotically, both will still have the same average and worst-case costs as the code in your question, because searches in an array are linear. Then if the array contains n subarrays, each of which has m items, the cost will be n m.

If you want to speed it up, you can use ECMAScript 6 sets. The searches are required to be sublinear on average. For example, if the implementation uses a hash, they will be constant, so the total cost will be n.

Then the function would be one of the following

function isValInArray(arrSets, value) {
  return arrSets.some(set => set.has(value));
}
function isValInArray(arrSets, value) {
  for (var i = 0; i < arrSets.length; ++i)
    if (arrSets[i].has(value)) return true;
  return false;
}

Example:

var arrSets = [new Set([1,2,3]), new Set([3,5,6])];
isValInArray(arrSets, 0); // false
isValInArray(arrSets, 1); // true

In case you must use arrays because you want to keep indices, you can do a conversion before the search. But that will cost n m, so it will only be useful if you can reuse the sets because you want to do lots of searches.

var arrSets = arr.map(sub => new Set(sub));

But in that case, you don't need to keep the sets separated. Similarly to what @Mogsdad proposed, you can insert the elements of all the arrays in a single set, which will cost n m too. The advantage is that searches will be constant. Example:

var arr = [[1,2,3], [3,5,6]],
    set = new Set([].concat.apply([], arr));
set.has(0); // false
set.has(1); // true
Community
  • 1
  • 1
Oriol
  • 274,082
  • 63
  • 437
  • 513
  • Thanks! Your second code is the fastest one I have tried out so far! – c C Dec 31 '15 at 03:19
  • `some` is likely to be meaningfully slower than the plain old loop. –  Dec 31 '15 at 04:02
  • Would upvote except for the `some` solution, which after measurement turns out to be 5-10x slower. –  Dec 31 '15 at 04:24
  • @torazaburo Yes, `some` requires calling a function at each iteration, which can be a bit expensive. But it's still the same asymptotic cost, so the improvement in semantics can be worth it. As I said at the end, the real gain in speed would be using ES6 sets. – Oriol Dec 31 '15 at 15:50
  • @Oriol, can you please provide an example with an ES6 set? The `some` approach has still been the fastest so far with the arrays I'm working with. – c C Dec 31 '15 at 20:47
  • @cC I have included a few examples. – Oriol Dec 31 '15 at 23:58