2

I've been reading conflicting answers about modern javascript engines' time complexity when it comes to sets vs arrays in javascript.

I completed the demo task of codility, which is a simple assignment to find a solution for the following: given an array A of N integers, return the smallest positive integer (greater than 0) that does not occur in A.

For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5.

My first solution was:

const solution = arr => {
    for(let int = 1;;int++) {
        if (!arr.includes(int)) {
            return int;
        }
    }
}

Now, the weird thing is that codility says this solution has a time complexity of O(n**2) (they prefer a solution of complexity O(n). As far as I know, array.prototype.includes is a linear search (https://tc39.es/ecma262/#sec-array.prototype.includes) meaning it should have an O(n) time complexity.

If I enter a different solution, using a Set, I get the full score:

const solution = arr => {
  const set = new Set(arr);
  let i = 1;

  while (set.has(i)) {
    i++;
  }

  return i;
}

Codility says this apparently has a time complexity of O(N) or O(N * log(N)).

Is this correct? Is array.prototype.includes in fact O(n**2) instead of O(n)?

Lastly, I'm a bit confused as to why Set.has() is preferred as in my console performance tests, Array.includes() is consistently outperforming the solution to first create a Set and then looking it up on the set, as can be seen in the following snippet.

const rand = (size) => [...Array(size)].map(() => Math.floor(Math.random() * size));

const small = rand(100);
const medium = rand(5000);
const large = rand(100000);

const solution1 = arr => {
  console.time('Array.includes');
  for(let int = 1;;int++) {
    if (!arr.includes(int)) {
      console.timeEnd('Array.includes');
      return int;
    }
  }
}

const solution2 = arr => {
  console.time('Set.has');
  const set = new Set(arr);
  let i = 1;
  while (set.has(i)) {
    i++;
  }
  console.timeEnd('Set.has');
  return i;
}

console.log('Testing small array:');
solution1(small);
solution2(small);
console.log('Testing medium array:');
solution1(medium);
solution2(medium);
console.log('Testing large array:');
solution1(large);
solution2(large);

If a set lookup has better time complexity (if that's true) and is preferred by codility, why are my performance tests favoring the array.prototype.includes solution?

rails_has_elegance
  • 1,590
  • 4
  • 21
  • 37

2 Answers2

3

I know this is an old question, but I was double checking the data. I too assumed Set.has would be O(1) or O(log N), but in my first test, it appeared to be O(N). The specs for these functions hint as much, but are quite hard to decipher: https://tc39.es/ecma262/#sec-array.prototype.includes https://tc39.es/ecma262/#sec-set.prototype.has Elsewhere, though, they also say that Set.has must be sublinear-- and I believe modern implementations are.

Empirically, Set.has demonstrates linear performance when I ran it in some code playgrounds... but in real environments like node and chrome, they there were no surprises. I'm not sure what the playground was running on the back end, but perhaps a Set polyfill was used. So be careful!

Here's my test cases, trimmed down to remove the randomness:

const makeArray = (size) => [...Array(size)].map(() => size);

const small =  makeArray(1000000);
const medium = makeArray(10000000);
const large =  makeArray(100000000);

const solution1 = arr => {
  console.time('Array.includes');
  arr.includes(arr.length - 1)
  console.timeEnd('Array.includes');
}

const solution2 = arr => {
  const set = new Set(arr)
  console.time('Set.has');
  set.has(arr.length-1)
  console.timeEnd('Set.has');
}


console.log('** Testing small array:');
solution1(small);
solution2(small);
console.log('** Testing medium array:');
solution1(medium);
solution2(medium);
console.log('** Testing large array:');
solution1(large);
solution2(large);

In Chrome, though:

** Testing small array:
VM183:10 Array.includes: 1.371826171875 ms
VM183:17 Set.has: 0.005859375 ms
VM183:25 ** Testing medium array:
VM183:10 Array.includes: 14.32568359375 ms
VM183:17 Set.has: 0.009765625 ms
VM183:28 ** Testing large array:
VM183:10 Array.includes: 115.695068359375 ms
VM183:17 Set.has: 0.0048828125 ms

In Node 16.5:

Testing small array:
Array.includes: 1.223ms
Set.has: 0.01ms
Testing medium array:
Array.includes: 11.41ms
Set.has: 0.054ms
Testing large array:
Array.includes: 127.297ms
Set.has: 0.047ms

So, yeah, Arrays are definitionly linear, and Sets are much faster.

ndp
  • 21,546
  • 5
  • 36
  • 52
1

The comparison like that is not entirely fair because in the function where you use the Set, the Array needs to be converted to a Set first, which takes some time.

Have a look at the results below if this is ignored. I have updated the solution2 function to receive a Set and changed the while loop to a for loop - for the sake of direct comparison.

You may notice that for a small array, Set might be slower. This is trivial because the time complexity only really comes into affect for a large (significant) n.

Also note, Array.includes is indeed O(n) but because it is in a for loop which in the worst case could go up to n the solution has a time complexity of O(n^2).

const rand = (size) => [...Array(size)].map(() => Math.floor(Math.random() * size));

const small = rand(100);
const medium = rand(5000);
const large = rand(100000);

const solution1 = arr => {
  console.time('Array.includes');
  for (let int = 1;;int++) {
    if (!arr.includes(int)) {
      console.timeEnd('Array.includes');
      return int;
    }
  }
}

const solution2 = set => {
  console.time('Set.has');
  for (let i = 1;;i++) {
    if (!set.has(i)) {
      console.timeEnd('Set.has');
      return i
    }
  }
}

console.log('Testing small array:');
solution1(small);
solution2(new Set(small));
console.log('Testing medium array:');
solution1(medium);
solution2(new Set(medium));
console.log('Testing large array:');
solution1(large);
solution2(new Set(large));
Sebastian
  • 343
  • 1
  • 11
  • 1
    > "Array.includes is indeed O(n)" According to @yury-tarabanko he seems to think its Linear given his interpretation of the spec https://stackoverflow.com/a/48761894/4642530. Maybe Im missing something but I left a comment there, my interpretation of the spec is that it is also O(n). (maybe under the hood ecma is using a hashmap somehow?) – garrettmac Jan 19 '22 at 19:20
  • I see that your comment was already addressed in the question you linked above. The spec indeed indicates a time complexity of O(n) which is also called linear. There might have been confusion with O(1), however this is called a constant time complexity. – Sebastian Jan 20 '22 at 21:34