Why is the first function call is executed two times faster than all other sequential calls?

Question

I have a custom JS iterator implementation and code for measuring performance of the latter implementation:

const ITERATION_END = Symbol('ITERATION_END');

const arrayIterator = (array) => {
  let index = 0;

  return {
    hasValue: true,
    next() {
      if (index >= array.length) {
        this.hasValue = false;

        return ITERATION_END;
      }

      return array[index++];
    },
  };
};

const customIterator = (valueGetter) => {
  return {
    hasValue: true,
    next() {
      const nextValue = valueGetter();

      if (nextValue === ITERATION_END) {
        this.hasValue = false;

        return ITERATION_END;
      }

      return nextValue;
    },
  };
};

const map = (iterator, selector) => customIterator(() => {
  const value = iterator.next();

  return value === ITERATION_END ? value : selector(value);
});

const filter = (iterator, predicate) => customIterator(() => {
  if (!iterator.hasValue) {
    return ITERATION_END;
  }

  let currentValue = iterator.next();

  while (iterator.hasValue && currentValue !== ITERATION_END && !predicate(currentValue)) {
    currentValue = iterator.next();
  }

  return currentValue;
});

const toArray = (iterator) => {
  const array = [];

  while (iterator.hasValue) {
    const value = iterator.next();

    if (value !== ITERATION_END) {
      array.push(value);
    }
  }

  return array;
};

const test = (fn, iterations) => {
  const times = [];

  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    fn();
    times.push(performance.now() - start);
  }

  console.log(times);
  console.log(times.reduce((sum, x) => sum + x, 0) / times.length);
}

const createData = () => Array.from({ length: 9000000 }, (_, i) => i + 1);

const testIterator = (data) => () => toArray(map(filter(arrayIterator(data), x => x % 2 === 0), x => x * 2))

test(testIterator(createData()), 10);

The output of the test function is very weird and unexpected - the first test run is constantly executed two times faster than all the other runs. One of the results, where the array contains all execution times and the number is the mean (I ran it on Node):

[
  147.9088459983468,
  396.3472499996424,
  374.82447600364685,
  367.74555300176144,
  363.6300039961934,
  362.44370299577713,
  363.8418449983001,
  390.86111199855804,
  360.23125199973583,
  358.4788999930024
]
348.6312940984964

Similar results can be observed using Deno runtime, however I could not reproduce this behaviour on other JS engines. What can be the reason behind it on the V8?

Environment: Node v13.8.0, V8 v7.9.317.25-node.28, Deno v1.3.3, V8 v8.6.334

Well, you are creating multiple 9000000 element arrays and then local variables on every iteration of those large arrays so memory management/garbage collection is going to get involved pretty significantly at some point. Often times a 3x difference in performance like this in V8 is caused when you fall out of some optimization path, but here it could also just be garbage collection. — jfriend00, Sep 07 '20 at 19:21
There is a single 9000000 elements array, which is created before test runs. On the line test(testIterator(createData()), 10); Or am I missing something? — laleksiunas, Sep 07 '20 at 19:29
Well `toArray()` creates another array and you're calling multiple functions for every element of the array that each create some objects and you're doing all of that 10 times. There are a LOT of objects being created here. I find the concept of this code very confusing to follow, but iterating a very large array 10 times where each step of the iteration calls functions with local variables results in a lot of objects. This is something a GC language isn't great at. — jfriend00, Sep 07 '20 at 19:32
Curiously, if I force GC between each iteration of your loop in `test()` (outside of the timed part), I find that the first 2 iterations are fast (instead of just the first 1) and then the rest are even slower. I don't know what that means, but it points to perhaps GC is relevant. — jfriend00, Sep 07 '20 at 19:34
When I replace toArray with something that just forces iteration without creating an array, the deviation becomes even larger reaching 5-6 times difference on Node. Regarding multiple objects creation, my main idea was to replace the native iterators that create {value, done} object on each iteration which is really ineffective as you mention. However, my version is not making many objects allocations, namely a single for each iterator. — laleksiunas, Sep 07 '20 at 19:40
The game with these optimizing interpreters is that one moment you're on a heavily optimized code path and things are fast and then next moment you aren't and you have no idea why. If this is what is going on here, it would typically take someone from the right part of the V8 team to actually explain what is happening. It seems less likely to me in this case because you're just running the same code 10 times and not modifying the array you're iterating. But, stranger things have happened in V8. — jfriend00, Sep 07 '20 at 19:46
FYI, creating a large array by using `.push()` one element at a time has some advantages in that it preserves the opportunity for array optimization by V8, but some disadvantages in that it's going to create a lot of memory use as it has to continually grow the block of memory the array is in and copy it over to the new block. But, if you've stubbed that out of `toArray()`, then that isn't your main issue. — jfriend00, Sep 07 '20 at 19:48

score 4 · Accepted Answer · answered Sep 07 '20 at 21:14

(V8 developer here.) In short: it's inlining, or lack thereof, as decided by engine heuristics.

For an optimizing compiler, inlining a called function can have significant benefits (e.g.: avoids the call overhead, sometimes makes constant folding possible, or elimination of duplicate computations, sometimes even creates new opportunities for additional inlining), but comes at a cost: it makes the compilation itself slower, and it increases the risk of having to throw away the optimized code ("deoptimize") later due to some assumption that turns out not to hold. Inlining nothing would waste performance, inlining everything would waste performance, inlining exactly the right functions would require being able to predict the future behavior of the program, which is obviously impossible. So compilers use heuristics.

V8's optimizing compiler currently has a heuristic to inline functions only if it was always the same function that was called at a particular place. In this case, that's the case for the first iterations. Subsequent iterations then create new closures as callbacks, which from V8's point of view are new functions, so they don't get inlined. (V8 actually knows some advanced tricks that allow it to de-duplicate function instances coming from the same source in some cases and inline them anyway; but in this case those are not applicable [I'm not sure why]).

So in the first iteration, everything (including x => x % 2 === 0 and x => x * 2) gets inlined into toArray. From the second iteration onwards, that's no longer the case, and instead the generated code performs actual function calls.

That's probably fine; I would guess that in most real applications, the difference is barely measurable. (Reduced test cases tend to make such differences stand out more; but changing the design of a larger app based on observations made on a small test is often not the most impactful way to spend your time, and at worst can make things worse.)

Also, hand-optimizing code for engines/compilers is a difficult balance. I would generally recommend not to do that (because engines improve over time, and it really is their job to make your code fast); on the other hand, there clearly is more efficient code and less efficient code, and for maximum overall efficiency, everyone involved needs to do their part, i.e. you might as well make the engine's job simpler when you can.

If you do want to fine-tune performance of this, you can do so by separating code and data, thereby making sure that always the same functions get called. For example like this modified version of your code:

const ITERATION_END = Symbol('ITERATION_END');

class ArrayIterator {
  constructor(array) {
    this.array = array;
    this.index = 0;
  }
  next() {
    if (this.index >= this.array.length) return ITERATION_END;
    return this.array[this.index++];
  }
}
function arrayIterator(array) {
  return new ArrayIterator(array);
}

class MapIterator {
  constructor(source, modifier) {
    this.source = source;
    this.modifier = modifier;
  }
  next() {
    const value = this.source.next();
    return value === ITERATION_END ? value : this.modifier(value);
  }
}
function map(iterator, selector) {
  return new MapIterator(iterator, selector);
}

class FilterIterator {
  constructor(source, predicate) {
    this.source = source;
    this.predicate = predicate;
  }
  next() {
    let value = this.source.next();
    while (value !== ITERATION_END && !this.predicate(value)) {
      value = this.source.next();
    }
    return value;
  }
}
function filter(iterator, predicate) {
  return new FilterIterator(iterator, predicate);
}

function toArray(iterator) {
  const array = [];
  let value;
  while ((value = iterator.next()) !== ITERATION_END) {
    array.push(value);
  }
  return array;
}

function test(fn, iterations) {
  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    fn();
    console.log(performance.now() - start);
  }
}

function createData() {
  return Array.from({ length: 9000000 }, (_, i) => i + 1);
};

function even(x) { return x % 2 === 0; }
function double(x) { return x * 2; }
function testIterator(data) {
  return function main() {
    return toArray(map(filter(arrayIterator(data), even), double));
  };
}

test(testIterator(createData()), 10);

Observe how there are no more dynamically created functions on the hot path, and the "public interface" (i.e. the way arrayIterator, map, filter, and toArray compose) is exactly the same as before, only under-the-hood details have changed. A benefit of giving all functions names is that you get more useful profiling output ;-)

Astute readers will notice that this modification only shifts the issue away: if you have several places in your code that call map and filter with different modifiers/predicates, then the inlineability issue will come up again. As I said above: microbenchmarks tend to be misleading, as real apps typically have different behavior...

(FWIW, this is pretty much the same effect as at Why is the execution time of this function call changing? .)

So you're saying that the callbacks are inlined in the first iteration, but then when the interpreter realizes the inlined version can't be used again it switches to regular function calls and that's responsible for the performance difference? — jfriend00, Sep 08 '20 at 03:29
Yes. Do note that it's very hard to predict for humans what will or won't get inlined -- that's true for all compilers. The most applicable takeaway from this is probably to disregard the first one or two iterations of any artificial tests. — jmrk, Sep 08 '20 at 15:58

score 3 · Answer 2 · answered Sep 08 '20 at 20:07

Just to add to this investigation, I compared the OP's original code with the predicate and selector functions declared as separate functions as suggested by jmrk to two other implementations. So, this code has three implementations:

OP's code with predicate and selector functions declared separately as named functions (not inline).
Using standard array.map() and .filter() (which you would think would be slower because of the extra creation of intermediate arrays)
Using a custom iteration that does both filtering and mapping in one iteration

The OP's attempt at saving time and making things faster is actually the slowest (on average). The custom iteration is the fastest.

I guess the lesson here is that it's not necessarily intuitive how you make things faster with the optimizing compiler so if you're tuning performance, you have to measure against the "typical" way of doing things (which may benefit from the most optimizations).

Also, note that in the method #3, the first two iterations are the slowest and then it gets faster - the opposite effect from the original code. Go figure.

The results are here:

[
  99.90320014953613,
  253.79690098762512,
  271.3091011047363,
  247.94990015029907,
  247.457200050354,
  261.9487009048462,
  252.95090007781982,
  250.8520998954773,
  270.42809987068176,
  249.340900182724
]
240.59370033740998
[
  222.14270091056824,
  220.48679995536804,
  224.24630093574524,
  237.07260012626648,
  218.47070002555847,
  218.1493010520935,
  221.50559997558594,
  223.3587999343872,
  231.1618001461029,
  243.55419993400574
]
226.01488029956818
[
  147.81360006332397,
  144.57479882240295,
  73.13350009918213,
  79.41700005531311,
  77.38950109481812,
  78.40880012512207,
  112.31539988517761,
  80.87990117073059,
  76.7899010181427,
  79.79679894447327
]
95.05192012786866

The code is here:

const { performance } = require('perf_hooks');

const ITERATION_END = Symbol('ITERATION_END');

const arrayIterator = (array) => {
  let index = 0;

  return {
    hasValue: true,
    next() {
      if (index >= array.length) {
        this.hasValue = false;

        return ITERATION_END;
      }

      return array[index++];
    },
  };
};

const customIterator = (valueGetter) => {
  return {
    hasValue: true,
    next() {
      const nextValue = valueGetter();

      if (nextValue === ITERATION_END) {
        this.hasValue = false;

        return ITERATION_END;
      }

      return nextValue;
    },
  };
};

const map = (iterator, selector) => customIterator(() => {
  const value = iterator.next();

  return value === ITERATION_END ? value : selector(value);
});

const filter = (iterator, predicate) => customIterator(() => {
  if (!iterator.hasValue) {
    return ITERATION_END;
  }

  let currentValue = iterator.next();

  while (iterator.hasValue && currentValue !== ITERATION_END && !predicate(currentValue)) {
    currentValue = iterator.next();
  }

  return currentValue;
});

const toArray = (iterator) => {
  const array = [];

  while (iterator.hasValue) {
    const value = iterator.next();

    if (value !== ITERATION_END) {
      array.push(value);
    }
  }

  return array;
};

const test = (fn, iterations) => {
  const times = [];
  let result;

  for (let i = 0; i < iterations; i++) {
    const start = performance.now();
    result = fn();
    times.push(performance.now() - start);
  }

  console.log(times);
  console.log(times.reduce((sum, x) => sum + x, 0) / times.length);
  return result;
}

const createData = () => Array.from({ length: 9000000 }, (_, i) => i + 1);
const cache = createData();

const comp1 = x => x % 2 === 0;
const comp2 = x => x * 2;

const testIterator = (data) => () => toArray(map(filter(arrayIterator(data), comp1), comp2))

// regular array filter and map
const testIterator2 = (data) => () => data.filter(comp1).map(comp2);

// combine filter and map in same operation
const testIterator3 = (data) => () => {
    let result = [];
    for (let value of data) {
        if (comp1(value)) {
            result.push(comp2(value));
        }
    }
    return result;
}

const a = test(testIterator(cache), 10);
const b = test(testIterator2(cache), 10);
const c = test(testIterator3(cache), 10);

function compareArrays(a1, a2) {
    if (a1.length !== a2.length) return false;
    for (let [i, val] of a1.entries()) {
        if (a2[i] !== val) return false;
    }
    return true;
}

console.log(a.length);
console.log(compareArrays(a, b));
console.log(compareArrays(a, c));

Why is the first function call is executed two times faster than all other sequential calls?

2 Answers2