Why is the execution time of this function call changing?

Question

Preface

This issue seems to only affect Chrome/V8, and may not be reproducible in Firefox or other browsers. In summary, the execution time of a function callback increases by an order of magnitude or more if the function is called with a new callback anywhere else.

Simplified Proof-of-Concept

Calling test(callback) arbitrarily many times works as expected, but once you call test(differentCallback), the execution time of the test function increases dramatically no matter what callback is provided (i.e., another call to test(callback) would suffer as well).

This example was updated to use arguments so as to not be optimized to an empty loop. Callback arguments a and b are summed and added to total, which is logged.

function test(callback) {
    let start = performance.now(),
        total = 0;

    // add callback result to total
    for (let i = 0; i < 1e6; i++)
        total += callback(i, i + 1);

    console.log(`took ${(performance.now() - start).toFixed(2)}ms | total: ${total}`);
}

let callback1 = (a, b) => a + b,
    callback2 = (a, b) => a + b;

console.log('FIRST CALLBACK: FASTER');
for (let i = 1; i < 10; i++)
    test(callback1);

console.log('\nNEW CALLBACK: SLOWER');
for (let i = 1; i < 10; i++)
    test(callback2);

Original post

I am developing a StateMachine class (source) for a library I'm writing and the logic works as expected, but in profiling it, I've run into an issue. I noticed that when I ran the profiling snippet (in global scope), it would only take about 8ms to finish, but if I ran it a second time, it would take up to 50ms and eventually balloon as high as 400ms. Typically, running the same named function over and over will cause its execution time to drop as the V8 engine optimizes it, but the opposite seems to be happening here.

I've been able to get rid of the problem by wrapping it in a closure, but then I noticed another weird side effect: Calling a different function that relies on the StateMachine class would break the performance for all code depending on the class.

The class is pretty simple - you give it an initial state in the constructor or init, and you can update the state with the update method, which you pass a callback that accepts this.state as an argument (and usually modifies it). transition is a method that is used to update the state until the transitionCondition is no longer met.

Two test functions are provided: red and blue, which are identical, and each will generate a StateMachine with an initial state of { test: 0 } and use the transition method to update the state while state.test < 1e6. The end state is { test: 1000000 }.

You can trigger the profile by clicking the red or blue button, which will run StateMachine.transition 50 times and log the average time the call took to complete. If you click the red or blue button repeatedly, you will see that it clocks in at less than 10ms without issue - but, once you click the other button and call the other version of the same function, everything breaks, and the execution time for both functions will increase by about an order of magnitude.

// two identical functions, red() and blue()

function red() {
  let start = performance.now(),
      stateMachine = new StateMachine({
        test: 0
      });

  stateMachine.transition(
    state => state.test++, 
    state => state.test < 1e6
  );

  if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
  else return performance.now() - start;
}

function blue() {
  let start = performance.now(),
      stateMachine = new StateMachine({
        test: 0
      });

  stateMachine.transition(
    state => state.test++, 
    state => state.test < 1e6
  );

  if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
  else return performance.now() - start;
}

// display execution time
const display = (time) => document.getElementById('results').textContent = `Avg: ${time.toFixed(2)}ms`;

// handy dandy Array.avg()
Array.prototype.avg = function() {
  return this.reduce((a,b) => a+b) / this.length;
}

// bindings
document.getElementById('red').addEventListener('click', () => {
  const times = [];
  for (var i = 0; i < 50; i++)
    times.push(red());
    
  display(times.avg());
}),

document.getElementById('blue').addEventListener('click', () => {
  const times = [];
  for (var i = 0; i < 50; i++)
    times.push(blue());
    
  display(times.avg());
});

<script src="https://cdn.jsdelivr.net/gh/TeleworkInc/state-machine@bd486a339dca1b3ad3157df20e832ec23c6eb00b/StateMachine.js"></script>

<h2 id="results">Waiting...</h2>
<button id="red">Red Pill</button>
<button id="blue">Blue Pill</button>

<style>
body{box-sizing:border-box;padding:0 4rem;text-align:center}button,h2,p{width:100%;margin:auto;text-align:center;font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol"}button{font-size:1rem;padding:.5rem;width:180px;margin:1rem 0;border-radius:20px;outline:none;}#red{background:rgba(255,0,0,.24)}#blue{background:rgba(0,0,255,.24)}
</style>

Updates

Bug Report "Feature Request" filed (awaiting update) - See @jmrk's answers below for more details.

Ultimately, this behavior is unexpected and, IMO, qualifies as a nontrivial bug. The impact for me is significant - on Intel i7-4770 (8) @ 3.900GHz, my execution times in the example above go from an average of 2ms to 45ms (a 20x increase).

As for nontriviality, consider that any subsequent calls to StateMachine.transition after the first one will be unnecessarily slow, regardless of scope or location in the code. The fact that SpiderMonkey does not slow down subsequent calls to transition signals to me that there is room for improvement for this specific optimization logic in V8.

See below, where subsequent calls to StateMachine.transition are slowed:

// same source, several times

// 1
(function() {
  let start = performance.now(),
    stateMachine = new StateMachine({
      test: 0
    });

  stateMachine.transition(state => state.test++, state => state.test < 1e6);

  if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
  console.log(`took ${performance.now() - start}ms`);
})();


// 2 
(function() {
  let start = performance.now(),
    stateMachine = new StateMachine({
      test: 0
    });

  stateMachine.transition(state => state.test++, state => state.test < 1e6);

  if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
  console.log(`took ${performance.now() - start}ms`);
})();

// 3
(function() {
  let start = performance.now(),
    stateMachine = new StateMachine({
      test: 0
    });

  stateMachine.transition(state => state.test++, state => state.test < 1e6);

  if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
  console.log(`took ${performance.now() - start}ms`);
})();

<script src="https://cdn.jsdelivr.net/gh/TeleworkInc/state-machine@bd486a339dca1b3ad3157df20e832ec23c6eb00b/StateMachine.js"></script>

This performance decrease can be avoided by wrapping the code in a named closure, where presumably the optimizer knows the callbacks will not change:

var test = (function() {
    let start = performance.now(),
        stateMachine = new StateMachine({
            test: 0
        });
  
    stateMachine.transition(state => state.test++, state => state.test < 1e6);
  
    if (stateMachine.state.test !== 1e6) throw 'ASSERT ERROR!';
    console.log(`took ${performance.now() - start}ms`);
});

test();
test();
test();

<script src="https://cdn.jsdelivr.net/gh/TeleworkInc/state-machine@bd486a339dca1b3ad3157df20e832ec23c6eb00b/StateMachine.js"></script>

Platform Information

$ uname -a
Linux workspaces 5.4.0-39-generic #43-Ubuntu SMP Fri Jun 19 10:28:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ google-chrome --version
Google Chrome 83.0.4103.116

This is a very well-asked question, and your demo shows exactly what you mean. I'm really sorry I don't have any insight to offer. Have you tried it in a non-V8 environment? — Scott Sauyet, Jul 02 '20 at 21:20
@ScottSauyet Hey thanks, I appreciate you. I have not tried outside V8, only debugged in Chrome - let me fire up Firefox and see if this demo works as expected. — Lewis, Jul 02 '20 at 21:24
@ScottSauyet Oh, look at that! I'm not seeing this in FF, so it must be a V8 bug - would you recommend I close this or leave it up just in case it was compounded by a mistake I made or something? — Lewis, Jul 02 '20 at 21:26
Really seems like a V8 thing. SpiderMokey is rather consistent, with an average around 5ms. You might want check if that also happens in Node.js, just to check if the problem is from V8. — D. Pardal, Jul 02 '20 at 21:26
It looks like this bug is also non-existent in Chrome running on Linux (at least on Ubuntu). I consistently get between 5-8ms in Chrome on Linux — slebetman, Jul 02 '20 at 21:47
@slebetman Interesting, I'm on Ubuntu myself - updated the OP with platform information. — Lewis, Jul 02 '20 at 21:49
I consistently get between 16-45 ms on Windows Chrome Version 83.0.4103.116 (Official Build) (64-bit). Note that there are some caveats to the accuracy of `performance`. See https://developer.mozilla.org/en-US/docs/Web/API/Performance/now. — Trentium, Jul 02 '20 at 21:59
@JonTrent If you get 16ms one run, and 46ms the next time, that is hardly consistency (at 3X variance). If your execution times balloon like that (and then plateau) after clicking a second button, then you have reproduced the issue without knowing. In any case, please do not pin the blame on `performance.now` unless you can demonstrate that this is the case. — Lewis, Jul 02 '20 at 22:05
@christian my tests were floating in the range specified, not consistently getting worse as you've indicated in your question. I am simply pointing out that there can be some variance in accuracy in what on the surface appears to be a very accurate way of timing performance. — Trentium, Jul 02 '20 at 22:11
@JonTrent I apologize if I seem rude, but the question invites people to provide a workaround or offer insight. Mentioning `performance.now` caveats does not add anything to the discussion, and assumes I do not know a 100ms lag when I see one. I increased the sample size to 50, to account for noise - please try again, you'll definitely notice the issue if it's present. — Lewis, Jul 02 '20 at 22:33
@christian no worries. Just trying to provide data points. So copying all your code to a local HTML file, and running in Chrome, what I see is that whichever button I press first runs in ~2ms, and thereafter pressing the second button I see ~20ms+/-, and then going back to the first button I oddly then see ~20ms+/-. Experimenting, if I copy `transition` and make a `transitionRed` and `transitionBlue`, then regardless of which order I then press the buttons, the results are all ~2ms. Something about the shared use of the `transition` method appears to introduce performance issues... — Trentium, Jul 03 '20 at 00:39
And one other data point... if I create in global space a single StateMachine object, and reuse this in `red()` and `blue()` (setting of course `.test=0` before running `.transition()`), the performance is less than 0.12ms consistently, regardless of the order and number of times Red Pill or Blue Pill is pressed... — Trentium, Jul 03 '20 at 00:47
@OP Don't close it yet. Wait for some resident V8 expert (like @jmrk) to hop on and talk. It might be a bug, or it might be some kind of optimization where the behavior is expected. (It'd be nice if we could ping known experts for special questions) — user120242, Jul 03 '20 at 01:00
One more data point. I can get this behavior consistently on Chrome and Firefox on Linux: If I press the red or blue button I consistently get a time reading eg. 5ms on Chrome and 3ms on Firefox. If I then press the other color I consistently get an increase in time eg. around 70ms in Chrome and around 9ms in Firefox. In both cases pressing a different color increases processing time though it is worse in Chrome it is still a 3X increase in Firefox — slebetman, Jul 03 '20 at 01:06
BTW if anyone here noticed my use of "resident expert", and found it weird that I used a term that is commonly used for sarcasm (or less often just to refer to someone who knows more than those in his environment but is not an "expert" in the true sense of the term), do know that it was unintentional. I need to go back to Native English Speakers 101 :( — user120242, Jul 03 '20 at 01:37
@user120242 You know, I think a forum like this might be the *only* place the phrase "resident expert" would come across as **completely sincere**, and FWIW I absolutely did not think you were being sarcastic! jmrk's response was extremely insightful, he is clearly an actual expert lol, I knew what you meant! <3 — Lewis, Jul 03 '20 at 01:38
This was a fascinating Q + A. Kudos to the OP and the resident expert (I didn't hear any irony in this phrase either!) for asking well and answering well. I learned a lot! — Scott Sauyet, Jul 03 '20 at 21:42

score 49 · Accepted Answer · answered Jul 03 '20 at 01:15

V8 developer here. It's not a bug, it's just an optimization that V8 doesn't do. It's interesting to see that Firefox seems to do it...

FWIW, I don't see "ballooning to 400ms"; instead (similar to Jon Trent's comment) I see about 2.5ms at first, and then around 11ms.

Here's the explanation:

When you click only one button, then transition only ever sees one callback. (Strictly speaking it's a new instance of the arrow function every time, but since they all stem from the same function in the source, they're "deduped" for type feedback tracking purposes. Also, strictly speaking it's one callback each for stateTransition and transitionCondition, but that just duplicates the situation; either one alone would reproduce it.) When transition gets optimized, the optimizing compiler decides to inline the called function, because having seen only one function there in the past, it can make a high-confidence guess that it's also always going to be that one function in the future. Since the function does extremely little work, avoiding the overhead of calling it provides a huge performance boost.

Once the second button is clicked, transition sees a second function. It must get deoptimized the first time this happens; since it's still hot it'll get reoptimized soon after, but this time the optimizer decides not to inline, because it's seen more than one function before, and inlining can be very expensive. The result is that from this point onwards, you'll see the time it takes to actually perform these calls. (The fact that both functions have identical source doesn't matter; checking that wouldn't be worth it because outside of toy examples that would almost never be the case.)

There's a workaround, but it's something of a hack, and I don't recommend putting hacks into user code to account for engine behavior. V8 does support "polymorphic inlining", but (currently) only if it can deduce the call target from some object's type. So if you construct "config" objects that have the right functions installed as methods on their prototype, you can get V8 to inline them. Like so:

class StateMachine {
  ...
  transition(config, maxCalls = Infinity) {
    let i = 0;
    while (
      config.condition &&
      config.condition(this.state) &&
      i++ < maxCalls
    ) config.transition(this.state);

    return this;
  }
  ...
}

class RedConfig {
  transition(state) { return state.test++ }
  condition(state) { return state.test < 1e6 }
}
class BlueConfig {
  transition(state) { return state.test++ }
  condition(state) { return state.test < 1e6 }
}

function red() {
  ...
  stateMachine.transition(new RedConfig());
  ...
}
function blue() {
  ...
  stateMachine.transition(new BlueConfig());
  ...
}

It might be worth filing a bug (crbug.com/v8/new) to ask if the compiler team thinks that this is worth improving. Theoretically it should be possible to inline several functions that are called directly, and branch between the inlined paths based on the value of the function variable that's being called. However I'm not sure there are many cases where the impact is as pronounced as in this simple benchmark, and I know that recently the trend has been towards inlining less rather than more, because on average that tends to be the better tradeoff (there are drawbacks to inlining, and whether it's worth it is necessarily always a guess, because the engine would have to predict the future in order to be sure).

In conclusion, coding with many callbacks is a very flexible and often elegant technique, but it tends to come at an efficiency cost. (There are other varieties of inefficiency: e.g. a call with an inline arrow function like transition(state => state.something) allocates a new function object each time it's executed; that just so happens not to matter much in the example at hand.) Sometimes engines might be able to optimize away the overhead, and sometimes not.

Thank you very much for your extremely helpful answer. I am going to try to reproduce the situation where a few extra calls would push execution time up to 400ms (which I definitely managed to produce multiple times independently), and then I will mark this as the answer. =) — Lewis, Jul 03 '20 at 01:24
I have not been able to reproduce the 400ms execution time on a subsequent function call, *but* I did update the OP with some additional examples and why I think this is a bug (and a nontrivial one at that). Thank you so much for your answer! — Lewis, Jul 03 '20 at 03:33
Updated the OP with a simplified proof-of-concept and stuck it at the top. The V8 team seems to concur with you about this not being a bug, and flagged it as a feature request (*literally* INABIAF), so I guess everyone would do well to remember that you should only call a function which accesses a callback frequently JUST the one time in V8 environments or else you'll take a huge performance hit. Totally by design, nothing to see here! — Lewis, Jul 03 '20 at 20:16
I think your new summary is misleading. It's not about repeated calls. I'd summarize as: *if you have several calls to a function that each pass a different callback, then those callbacks won't get inlined*. If you drop the "first call" line in your simplified repro, you'll see that you can run the loop as often as you want and there'll be no slowdown. The fact that your different arrow functions have the same source doesn't matter -- if they're defined separately then they count as separate functions. — jmrk, Jul 03 '20 at 22:13
Oh, also, there's a crucial difference between saying "this behavior is a feature, not a bug" and "your report is a request for a new feature". The latter (which is what happened here) simply means: the software isn't *broken*; so what you're asking isn't to *fix* it, but to *improve* it. — jmrk, Jul 03 '20 at 22:17
You'll have to forgive my frustration, I was hoping to point out this issue and learn something about how to repair it, I did not expect to get into the weeds about whether or not software which produces wildly unpredictable behavior should be characterized as *broken* or just *in need of improvement.* The general consensus seems to be, "yes, there is an issue here, but it is not actually a bug despite its documented impact on performance." — Lewis, Jul 03 '20 at 22:29
Also @jmrk I updated the post - I more fully understand what you were saying now, but the fact that this is not limited to anonymous functions, arrow functions, nor even *frequently calling* the callback *raises* the impact of this issue, not lessens it. The POC is much clearer now I think. — Lewis, Jul 03 '20 at 23:06

jmrk · Answer 2 · 2020-07-08T11:10:19.870

Since this is getting so much interest (and updates to the question), I thought I'd provide some additional detail.

The new simplified test case is great: it's very simple, and very clearly shows a problem.

function test(callback) {
  let start = performance.now();
  for (let i = 0; i < 1e6; i++) callback();
  console.log(`${callback.name} took ${(performance.now() - start).toFixed(2)}ms`);
}

var exampleA = (a,b) => 10**10;
var exampleB = (a,b) => 10**10;

// one callback -> fast
for (let i = 0; i < 10; i++) test(exampleA);

// introduce a second callback -> much slower forever
for (let i = 0; i < 10; i++) test(exampleB);
for (let i = 0; i < 10; i++) test(exampleA);

On my machine, I'm seeing times go as low as 0.23 ms for exampleA alone, and then they go up to 7.3ms when exampleB comes along, and stay there. Wow, a 30x slowdown! Clearly that's a bug in V8? Why wouldn't the team jump on fixing this?

Well, the situation is more complicated than it seems at first.

Firstly, the "slow" case is the normal situation. That's what you should expect to see in most code. It's still pretty fast! You can do a million function calls (plus a million exponentiations, plus a million loop iterations) in just 7 milliseconds! That's only 7 nanoseconds per iteration+call+exponentiation+return!

Actually, that analysis was a bit simplified. In reality, an operation on two constants like 10**10 will be constant-folded at compile time, so once exampleA and exampleB get optimized, the optimized code for them will return 1e10 immediately, without doing any multiplications. On the flip side, the code here contains a small oversight that causes the engine to have to do more work: exampleA and exampleB take two parameters (a, b), but they're called without any arguments simply as callback(). Bridging this difference between expected and actual number of parameters is fast, but on a test like this that doesn't do much else, it amounts to about 40% of the total time spent. So a more accurate statement would be: it takes about 4 nanoseconds to do a loop iteration plus a function call plus a materialization of a number constant plus a function return, or 7 ns if the engine additionally has to adapt the arguments count of the call.

So what about the initial results for just exampleA, how can that case be so much faster? Well, that's the lucky situation that hits various optimizations in V8 and can take several shortcuts -- in fact it can take so many shortcuts that it ends up being a misleading microbenchmark: the results it produces don't reflect real situations, and can easily cause an observer to draw incorrect conclusions. The general effect that "always the same callback" is (typically) faster than "several different callbacks" is certainly real, but this test significantly distorts the magnitude of the difference. At first, V8 sees that it's always the same function that's getting called, so the optimizing compiler decides to inline the function instead of calling it. That avoids the adaptation of arguments right off the bat. After inlining, the compiler can also see that the result of the exponentiation is never used, so it drops that entirely. The end result is that this test tests an empty loop! See for yourself:

function test_empty(no_callback) {
  let start = performance.now();
  for (let i = 0; i < 1e6; i++) {}
  console.log(`empty loop took ${(performance.now() - start).toFixed(2)}ms`);
}

That gives me the same 0.23ms as calling exampleA. So contrary to what we thought, we didn't measure the time it takes to call and execute exampleA, in reality we measured no calls at all, and no 10**10 exponentiations either. (If you like more direct proof, you can run the original test in d8 or node with --print-opt-code and see the disassembly of the optimized code that V8 generates internally.)

All that lets us conclude a few things:

(1) This is not a case of "OMG there's this horrible slowdown that you must be aware of and avoid in your code". The default performance you get when you don't worry about this is great. Sometimes when the stars align you might see even more impressive optimizations, but… to put it lightly: just because you only get presents on a few occasions per year, doesn't mean that all the other non-gift-bearing days are some horrible bug that must be avoided.

(2) The smaller your test case, the bigger the observed difference between default speed and lucky fast case. If your callbacks are doing actual work that the compiler can't just eliminate, then the difference will be smaller than seen here. If your callbacks are doing more work than a single operation, then the fraction of overall time that's spent on the call itself will be smaller, so replacing the call with inlining will make less of a difference than it does here. If your functions are called with the parameters they need, that will avoid the needless penalization seen here. So while this microbenchmark manages to create the misleading impression that there's a shockingly large 30x difference, in most real applications it will be between maybe 4x in extreme cases and "not even measurable at all" for many other cases.

(3) Function calls do have a cost. It's great that (for many languages, including JavaScript) we have optimizing compilers that can sometimes avoid them via inlining. If you have a case where you really, really care about every last bit of performance, and your compiler happens to not inline what you think it should be inlining (for whatever reason: because it can't, or because it has internal heuristics that decide not to), then it can give significant benefits to redesign your code a bit -- e.g. you could inline by hand, or otherwise restructure your control flow to avoid millions of calls to tiny functions in your hottest loops. (Don't blindly overdo it though: having too few too big functions isn't great for optimization either. Usually it's best to not worry about this. Organize your code into chunks that make sense, let the engine take care of the rest. I'm only saying that sometimes, when you observe specific problems, you can help the engine do its job better.) If you do need to rely on performance-sensitive function calls, then an easy tuning you can do is to make sure that you're calling your functions with exactly as many arguments as they expect -- which is probably often what you would do anyway. Of course optional arguments have their uses as well; like in so many other cases the extra flexibility comes with a (small) performance cost, which is often negligible, but can be taken into consideration when you feel that you have to.

(4) Observing such performance differences can understandably be surprising and sometimes even frustrating. Unfortunately, the nature of optimizations is such that they can't always be applied: they rely on making simplifying assumptions and not covering every case, otherwise they wouldn't be fast any more. We work very hard to give you reliable, predictable performance, with as many fast cases and as few slow cases as possible, and no steep cliffs between them. But we cannot escape the reality that we can't possibly "just make everything fast". (Which of course isn't to say that there's nothing left to do: every additional year of engineering work brings additional performance gains.) If we wanted to avoid all cases where more-or-less similar code exhibits noticeably different performance, then the only way to accomplish that would be to not do any optimizations at all, and instead leave everything at baseline ("slow") implementations -- and I don't think that would make anyone happy.

EDIT to add: It seems there are major differences between different CPUs here, which probably explains why previous commenters have reported so wildly differing results. On hardware I can get my hands on, I'm seeing:

i7 6600U: 3.3 ms for inlined case, 28 ms for calling
i7 3635QM: 2.8 ms for inlined case, 10 ms for calling
i7 3635QM, up-to-date microcode: 2.8 ms for inlined case, 26 ms for calling
Ryzen 3900X: 2.5 ms for inlined case, 5 ms for calling

This is all with Chrome 83/84 on Linux; it's very much possible that running on Windows or Mac would yield different results (because CPU/microcode/kernel/sandbox are closely interacting with each other). If you find these hardware differences shocking, read up on "spectre".

Thank you very much for this second (and much more explicit) answer. My only question is why you focused on the fact that my new repro function got optimized to an empty loop - in the `StateMachine` example, that was clearly not the case. The callback was doing things, it was modifying `this.state` as an argument, and I don't understand how the "optimized to an empty loop" extends to that example, BUT your explanation for the simplified POC was very detailed and is appreciated. — Lewis, Jul 04 '20 at 18:09
And I do apologize for all the questions and comments, it is just hard for me to reconcile all of this with the `StateMachine` pattern, which works exactly as I would expect in non-V8 environments (all of the repros fail in Firefox, for example). You *did* offer a workaround, but also advised against using it. — Lewis, Jul 04 '20 at 18:21
I updated the simple POC to actually take arguments so that is not optimized to an empty loop. I appreciate that this was mentioned, but it seemed to have no bearing on the actual issue at hand (as evidenced by the very first example, `StateMachine.transition`, which used its callback arguments and operated on them). — Lewis, Jul 04 '20 at 18:45
This answer focuses on your new repro; my original answer (unchanged) focuses on your original `StateMachine` example. That's why one mentions the empty loop and the other doesn't. -- As you can see, by further changing the example to produce unused huge numbers, you've made the non-inlined call much slower (7 -> 40ms), because it does even more work now, while the inlined case still gets optimized to an empty loop. So the test's observed differences in performance are even more exaggerated and hence even more misleading now. — jmrk, Jul 04 '20 at 20:59
I'm very sorry if I misled you with reproducible measurements. I updated the repro function to demonstrate that this issue persists even when the loop is not optimized away, and I welcome you to highlight the difference between *observed differences in performance* and *actual differences in performance* as it pertains to this issue. — Lewis, Jul 05 '20 at 00:19
You've been misleading _yourself_ (unintentionally, I know!). With the latest repro, I see 2.5ms vs. 5ms, so a 2x difference. That checks out, the difference is caused by calling vs. inlining, you may well see something like that in real code. The previous version of the repro gave a 30x difference caused by microbenchmarking artifacts -- that was misleading, because it was _not_ representative of non-microbenchmark scenarios. — jmrk, Jul 05 '20 at 10:26
Of course you can say that within one test case, "observed" and "actual" (differences in) performance are the same. But when you have one simplified artificial test that you intend to be representative of a more important real scenario, or of a general pattern, then the performance observed on the small test may well be very different from the actual performance you would get in other cases. And if you (or someone else) thinks "well, I see 30x here, so surely I'll see that elsewhere too", then that's what I call having gotten a "misleading" test result. — jmrk, Jul 05 '20 at 10:31
It's 2ms vs 25ms for me, hardware specs are in the OP (i7 3.9GHz). It's 6ms vs 60ms for the `StateMachine` example, which I would remind you was *real code* and not a "microbenchmark." I did not set out to find this issue, I set out to write a state machine which would call a transition callback while a condition was met, and I found that I cannot make more than one of those calls without seeing a dramatic performance hit, a performance hit that is not seen in Firefox. — Lewis, Jul 05 '20 at 19:59
I understand and appreciate your explanation of inlining heuristics, but I can't help but feel like *you* are being misleading in representing this issue as something that would never impact code that was not deliberately written to exploit this, when my original post was a real-world example from a real-world class I was writing, a very simple state machine paradigm with 28 SLOC (excluding comments). I am looking into how this might affect React's `setState` method as well, since it follows a similar form and executes frequently in V8, will update this thread if I find anything. — Lewis, Jul 05 '20 at 20:02
I know that your `StateMachine` was a real app, and I've never said otherwise. I've also never said that this doesn't affect real code, on the contrary. -- I've put a lot of effort into explaining things here; since you seem to feel that I'm antagonizing you, I will now retreat from this conversation. Good luck with your further investigations, and I hope I was at least a little bit helpful. — jmrk, Jul 05 '20 at 22:07
Okay, thank you. You *have* been very helpful and many people have learned from this post and subsequent discussion, your comments and the effort that you put into them (especially this one) are definitely appreciated. — Lewis, Jul 05 '20 at 22:30
@Christian You might want to accept the other answer instead — Bergi, Jul 15 '20 at 20:58
@jmrk "*The end result is that this test tests an empty loop*" - I'm waiting for a version of v8 that (accidentally - I know you would not search for empty loops specificallly as they're uncommon) would detect the empty loop and optimise it away completely, so that benchmarks will measure 0.00ms and everyone who's getting fooled by their microbenchmark will notice something is amiss. — Bergi, Jul 15 '20 at 21:01