Javascript nested function call optimization

Question

EDIT! I changed the answer to my own after a lot of follow-up research showed that there isn't a simple answer to my question. See below!

So, in a followup to my last question, I'm trying to get a better handle on best Javascript practices to optimize performance. For the following example, I'm testing in Chrome 28.0.1500.70 using the in-browser profiler.

I've got some math functions encapsulated in an object that are getting called a few hundred k-times a second and am trying to shave a bit of the execution time.

I've already done some optimization by making local copies of the parent objects locals as locals in the called functions themselves and got a decent (~16%) performance boost. However, when I did the same for calling another function from the parent object, i got a huge (~100%) performance increase.

The original setup was calcNeighbors calling fellow parent object function cirInd via this.cirInd.

Making a local var copy of cirInd and calling that instead gave a huge performance gain, less than half the execution time as before for calcNeighbors.

However, making cirInd an inline function in calcNeighbors caused a return to the same slower performance as calling it from the parent object.

I'm really perplexed by this. I suppose that it could be a quirk in Chrome's profiler (cirInd doesn't show up at all in the second case) but there is definitely a noticeable performance gain in the application when I use case 2.

Can someone explain why case 2 is so much faster than case 1 but more importantly, why case 3 seems to not give any performance gain?

The functions in question are here:

calling from parent object:

  window.bgVars = {
     <snip>
     "cirInd": function(index, mod){
        //returns modulus, array-wrapping value to implement circular array
        if(index<0){index+=mod;}
        return index%mod;
     },
     "calcNeighbors": function(rep){
        var foo = this.xBlocks;
        var grid = this.cGrid;
        var mod = grid.length;
        var cirInd = this.cirInd;
        var neighbors = grid[this.cirInd(rep-foo-1, mod)] + grid[this.cirInd(rep-foo, mod)] + grid[this.cirInd(rep-foo+1, mod)] + grid[this.cirInd(rep-1, mod)] + grid[this.cirInd(rep+1, mod)] + grid[this.cirInd(rep+foo-1, mod)] + grid[this.cirInd(rep+foo, mod)] + grid[this.cirInd(rep+foo+1, mod)];
        return neighbors;
     },
     <snip>
  }

calling via local variable:

  window.bgVars = {
     <snip>
     "cirInd": function(index, mod){
        //returns modulus, array-wrapping value to implement circular array
        if(index<0){index+=mod;}
        return index%mod;
     },
     "calcNeighbors": function(rep){
        var foo = this.xBlocks;
        var grid = this.cGrid;
        var mod = grid.length;
        var cirInd = this.cirInd;
        var neighbors = grid[cirInd(rep-foo-1, mod)] + grid[cirInd(rep-foo, mod)] + grid[cirInd(rep-foo+1, mod)] + grid[cirInd(rep-1, mod)] + grid[cirInd(rep+1, mod)] + grid[cirInd(rep+foo-1, mod)] + grid[cirInd(rep+foo, mod)] + grid[cirInd(rep+foo+1, mod)];
        return neighbors;
     },
     <snip>
  }

calling inline:

  window.bgVars = {
     <snip>
     "calcNeighbors": function(rep){
        var foo = this.xBlocks;
        var grid = this.cGrid;
        var mod = grid.length;
        function cirInd(index, mod){
          //returns modulus, array-wrapping value to implement circular array
          if(index<0){index+=mod;}
          return index%mod;
        }
        var neighbors = grid[cirInd(rep-foo-1, mod)] + grid[cirInd(rep-foo, mod)] + grid[cirInd(rep-foo+1, mod)] + grid[cirInd(rep-1, mod)] + grid[cirInd(rep+1, mod)] + grid[cirInd(rep+foo-1, mod)] + grid[cirInd(rep+foo, mod)] + grid[cirInd(rep+foo+1, mod)];
        return neighbors;
     },
     <snip>
  }

"why case 3 seems to not give any performance gain" --- why should moving a function to the same scope make some performance gain? — zerkms, Jul 11 '13 at 02:41
Not 100% relevant but still useful: https://developers.google.com/v8/design#prop_access — zerkms, Jul 11 '13 at 02:44
#3 generates a new function cirInd() each call, while #2 recycles the same one each call. fewer activation creations = faster runtime and less trash to clean up. — dandavis, Jul 11 '13 at 02:44

dandavis · Answer 1 · 2013-07-11T05:13:21.160

2

perhaps seeing #2 and #3 in a simplified view will help illustrate the object creation side-effects.

i believe this should make it obvious:

alls1=[];
alls2=[];

function inner1(){}
function outer1(){
     if(alls1.indexOf(inner1)===-1){ alls1.push(inner1); }
}


function outer2(){
   function inner2(){}
   if(alls2.indexOf(inner2)===-1){ alls2.push(inner2); }
}

for(i=0;i<10;i++){
   outer1();
   outer2();
}

alert([ alls1.length, alls2.length  ]); // shows: 1, 10

functions are objects, and making new objects is never free.

EDIT: expanding on #1 vs #2

again, the a simplified example will help illustrate:

function y(a,b){return a+b;}
var out={y:y};
var ob={
   y:y, 
   x1: function(a){ return this.y(i,a);},
   x2: function(a){ return y(i,a);},
   x3: function(a){ return out.y(i,a);}
}

var mx=999999, times=[], d2,d3,d1=+new Date;
for(var i=0;i<mx;i++){ ob.x1(-i) }
times.push( (d2=+new Date)-d1 );

for(var i=0;i<mx;i++){ ob.x2(-i) }
times.push( (d3=+new Date)-d2 );

for(var i=0;i<mx;i++){ ob.x3(-i) }
times.push( (+new Date)-d3 );

alert(times); // my chrome's typical: [ 1000, 1149, 1151 ]

understand that there is more noise in a simple example, and closure is a big chunk of the overhead in all3, but the diffs between them is what's important.

in this demo you won't see the huge gain observed in your dynamic system, but you do see how close y and out.y profile compared to this.y, all else being equal.

the main point is that it's not the extra dot resolution per se that slows things down, as some have alluded to, it's specifically the "this" keyword in V8 that matters, otherwise out.y() would profile closer to this.y()...

firefox is a different story.

tracing allows this.whatever to be predicted, so all three profile within a bad dice roll of each other, on the same comp as chrome: [2548, 2532, 2545]...

edited Jul 11 '13 at 05:13

answered Jul 11 '13 at 02:59

dandavis

16,370
5
40
36

Derp, you're right, i don't know why I didn't spot that in #3. I'm still a bit confused why #1 takes such a huge performance hit though. – DanHeidel Jul 11 '13 at 03:41
@DanHeidel: i would offer that the cirInd being invoked in #2 has no internal this binding, whereas it's the whole object in #1. the whole path token would normally be shortcut'd by the JIT. But, using "this." means that the holder object has to be re-assessed at execution time. all else being equal, using (almost) any single identifier to the left of the dot besides "this" should profile about the same a lexical named function ref. – dandavis Jul 11 '13 at 03:51
Interesting. So if I'm following you correctly, a reference to bgVars.cirInd rather than this.cirInd would allow the JIT to avoid the runtime lookup hit? – DanHeidel Jul 11 '13 at 03:58
i am not a low-level expert, but my understanding is this x.y() would profile more like y() than this.y(), if y were the same on all three. i'll edit the answer to elaborate. – dandavis Jul 11 '13 at 04:11
So, I tried the explicit reference vs using this and in Chrome, it gave no performance increase - it looked indistinguishable from #1. Even stranger, I set up a jsPerf to test all 4 cases and got *completely* different results. In that case, #1-3 are almost identical and #4 is over 10x slower. http://jsperf.com/different-nested-js-function-calls – DanHeidel Jul 11 '13 at 04:39
well, the first three are actually consistent with the answer's explanation. the inline (#3) is slower as it's more work and trash. #1 is ~12% faster than #2 again, just like the above. #4 is a good example of the kind of performance hit travis hit on, finding a property in an object in slow, and bgVars.cirInd can't be shortcut'd because, wait for it, bgVars===this... – dandavis Jul 11 '13 at 04:53
How come `x1` is *faster* than `x2` and `x3`? – user123444555621 Jul 11 '13 at 05:03
@Pumbaa80: i typed that wrong (slower), it's fixed... V8 memorizes x1's y and that y memorizes the static, frozen "this". x2 and x3 are both additional name lookups. – dandavis Jul 11 '13 at 05:10
@Pumbaa80: another way of looking at it is that x1 is closer to a pure function (http://en.wikipedia.org/wiki/Pure_function), so it's easier to optimize than the impure #2 and #3, which depend on outside factors. since 'this' is basically an argument in JS, x1 would be pure if we didn't close the "i". generally, the closer to pure, the faster in V8. – dandavis Jul 11 '13 at 05:22
@dandavis I don't understand. You're arguing that the `this` keyword is slowing V8 down in the original example, but it's speeding things up in your example? – user123444555621 Jul 11 '13 at 05:23
@Pumbaa80: no... let's be precise because we are both smart and likely misunderstood; this stuff can be confusing. re the OP #1 vs OP #2, #2 is faster because it uses a local, which is even better/purer than "this". in my demo, i never compare a local to a closed global. OP#3 is slower for non-this related reasons. does that make sense? – dandavis Jul 11 '13 at 05:32
I'm totally confused. You were saying *it's specifically the "this" keyword in V8 that matters*, and now you're talking about closed globals, which have never been in question. – user123444555621 Jul 11 '13 at 05:47
@Pumbaa80: my point is not that "this" is slower, just different. in other words, this.x() is not like something.x() and just x(). if that's good or bad perf-wise depends on the context being applied. if "this" cuts down closure, it's faster, if it adds to it, it's slower. since "this" is locked-in to the literal object's method, it doesn't have to be resolved every time the method executes, which is why it's faster in my 2nd demo. - the OP's main question was about 1+2 vs 3 anyways, and your stealin' my thunder ;) – dandavis Jul 11 '13 at 06:06

score 1 · Answer 2 · answered Jul 11 '13 at 02:43

1

The reason there is relatively extra time involved in number 1 should be obvious. You access the entire object scope, and then have to find a property.

Number 2 and 3 are both pointers to a function, so there is no seeking.

A very good resource for testing these types of situations is jsPerf, and I would highly recommend recreating the scenario there and running the test to see the exact differences and whether or not they are significant to you.

answered Jul 11 '13 at 02:43

Travis J

81,153
41
202
273

`O(n)`? ~~I'm sure it's a hash table not a list~~ http://stackoverflow.com/a/6602088/251311 – zerkms Jul 11 '13 at 02:44
@zerkms - So where does the time for accessing a property come from if it is a hash? I removed the time complexity assumption from my answer. – Travis J Jul 11 '13 at 02:46
see the comment update. Even for a hash-alike structure it's still more expensive than `O(1)` – zerkms Jul 11 '13 at 02:47
@zerkms - So for the hash, the calculation creates a key, and there is an iteration of the set of properties checking to see if the current key was the key generated for the hash. Once found, the value at the key's index is returned. Does that sound right to you? If so, how is that not O(n) where n is the number of keys in the hash? Worst case, every key is checked. – Travis J Jul 11 '13 at 02:58
for the string keys the lookup cost of a proper hash table implementation is close to `O(1)` (which is the ideal). `O(n)` is the worst case when all your keys fell into the same bucket (not possible for V8 implementation). Some useful reading: http://lemire.me/blog/archives/2009/08/18/do-hash-tables-work-in-constant-time/ – zerkms Jul 11 '13 at 03:25
I normally use jsPerf but since this is part of a Canvas app and I've had weird results doing Canvas testing there, I got lazy and have just been doing it in-browser. I totally get that case 1 incurs an overhead to search the parent object but making several variable references in the same function only gave modest performance increase. Shouldn't the lookup penalty be reasonably similar for variables and functions in the larger scope? The latter should just be finding and passing a function pointer... – DanHeidel Jul 11 '13 at 03:37
@DanHeidel: the lookup is an *additional* overhead. – zerkms Jul 11 '13 at 04:27
@zerkms: I'm aware that doing a lookup is going to incur overhead, my question is why doing a lookup on a function is so much worse than for a variable. The original function did lookups on 3 external variables and the function. Eliminating the 3 variable refernces only gave about 16% speedup while eliminating the function reference gave over 100%. That seems to indicate that there's an additional overhead of calling the non-local function beyond (and much greater than) the lookup cost. – DanHeidel Jul 11 '13 at 04:34
Also, I just set up a jsPerf for this problem and got entirely different results - #1-3 were essentially the same with the local var giving a small performance boost. That is more long the lines of what I would expect. The 4th case in the jsPerf is from a suggestion in the answer above using an explicit bgVars.cirInd reference rather than this.cirInd with the thought that is would allow the JIT to optimize the lookup away. That did not work at all. So, I'm doubly confused now. is the Chrome profiler just giving me incorrect results about #2 being twice as fast? – DanHeidel Jul 11 '13 at 04:43
@DanHeidel: when I said "lookup" I meant a key lookup in an object's "hash". Not sure what "variable lookup" you're talking about - as soon as object is available in the scope - access to it is instant as soon as it's a reference (names don't exist on runtime and interpreter works with references/values) – zerkms Jul 11 '13 at 04:43
@DanHeidel: #4 is presumably slow because it's the only closure one (http://jsperf.com/different-nested-js-function-calls/2) – zerkms Jul 11 '13 at 04:50

score 0 · Accepted Answer · answered Jul 23 '13 at 04:36

OK, I've been researching this issue for a while now and TL;DR - it's complicated.

Turns out that many performance questions really depend on the platform, browser and even minor browser revision number. And not by a little, either. There are many examples on jsPerf that show things such as 'for vs while; or 'typed arrays vs standard arrays' wildly swinging back and forth in terms of favorable execution speed with different browser releases. Presumably this is due to JIT optimization trade-offs.

Short answer to the general performance questions - just test everything in jsPerf. None of the suggestions I got in this thread were helpful in all cases. The JIT makes things complicated. This is particularly important if you have a background like mine and are used to C programs having certain rule-of-thumb coding patterns that tend to speed things up. Don't assume anything - just test it.

NOTE: many of the weird issues I listed in the original question weer due to using the default Chrome profiler. (e.g.: the profiler you get from the Ctl+Shift+I menu) If you are doing a lot of really fast loops (such as in graphics rendering), DO NOT USE THIS PROFILER. It has a time resolution of 1 ms which is much too coarse to do proper performance debugging.

In fact the ENTIRE issue I had with case 2 being so much faster than the others is entirely due to the profiler simply not 'seeing' many of the function calls and improperly reporting CPU percentages. Int he heat map, I could clearly see huge stretches where inner loop functions were firing but not being recorded by the profiler.

Solution: http://www.html5rocks.com/en/tutorials/games/abouttracing/# Chrome has a less-obvious and much more powerful profiler built into about:tracing. It's got microsecond resolution, the ability o read code tags for sub-function resolution and is generally much more kickass. As soon as I started using this profiler, the results fell into line with what I saw on jsPerf and helped me reduce my rendering time by nearly half. How did I do that? Again, it wasn't simple. In some cases, calling out to subroutines helped, in others it didn't. Refactoring the whole rendering engine from an object literal to module pattern seemed to help a bit. Precalcing any multiplication operations in for loops did seem to have big effects. etc, etc.

Quick notes about the about:tracing profiler: Zooming and panning is with ASWD on the keyboard - that took me a while to figure out. Also, it profiles all tabs and operates in a tab outside the page being analyzed. So minimize the number of extraneous tabs you have open since they will clutter up the profiler view. Also, if testing Canvas apps, be sure to switch tabs over to the app since RequestAnimationFrame generally doesn't fire when a tab is not active and visible.

Javascript nested function call optimization

3 Answers3