TL;DR The slower loop is due to accessing the Array 'out-of-bounds', which either forces the engine to recompile the function with less or even no optimizations OR to not compile the function with any of these optimizations to begin with (if the (JIT-)Compiler detected/suspected this condition before the first compilation 'version'), read on below why;
Someone just
has to say this (utterly amazed nobody already did):
There used to be a time when the OP's snippet would be a de-facto example in a beginners programming book intended to outline/emphasize that 'arrays' in javascript are indexed starting at 0, not 1, and as such be used as an example of a common 'beginners mistake' (don't you love how I avoided the phrase 'programing error'
;)
):
out-of-bounds Array access.
Example 1:
a Dense Array
(being contiguous (means in no gaps between indexes) AND actually an element at each index) of 5 elements using 0-based indexing (always in ES262).
var arr_five_char=['a', 'b', 'c', 'd', 'e']; // arr_five_char.length === 5
// indexes are: 0 , 1 , 2 , 3 , 4 // there is NO index number 5
Thus we are not really talking about performance difference between <
vs <=
(or 'one extra iteration'), but we are talking:
'why does the correct snippet (b) run faster than erroneous snippet (a)'?
The answer is 2-fold (although from a ES262 language implementer's perspective both are forms of optimization):
- Data-Representation: how to represent/store the Array internally in memory (object, hashmap, 'real' numerical array, etc.)
- Functional Machine-code: how to compile the code that accesses/handles (read/modify) these 'Arrays'
Item 1 is sufficiently (and correctly IMHO) explained by the accepted answer, but that only spends 2 words ('the code') on Item 2: compilation.
More precisely: JIT-Compilation and even more importantly JIT-RE-Compilation !
The language specification is basically just a description of a set of algorithms ('steps to perform to achieve defined end-result'). Which, as it turns out is a very beautiful way to describe a language.
And it leaves the actual method that an engine uses to achieve specified results open to the implementers, giving ample opportunity to come up with more efficient ways to produce defined results.
A spec conforming engine should give spec conforming results for any defined input.
Now, with javascript code/libraries/usage increasing, and remembering how much resources (time/memory/etc) a 'real' compiler uses, it's clear we can't make users visiting a web-page wait that long (and require them to have that many resources available).
Imagine the following simple function:
function sum(arr){
var r=0, i=0;
for(;i<arr.length;) r+=arr[i++];
return r;
}
Perfectly clear, right? Doesn't require ANY extra clarification, Right? The return-type is Number
, right?
Well.. no, no & no... It depends on what argument you pass to named function parameter arr
...
sum('abcde'); // String('0abcde')
sum([1,2,3]); // Number(6)
sum([1,,3]); // Number(NaN)
sum(['1',,3]); // String('01undefined3')
sum([1,,'3']); // String('NaN3')
sum([1,2,{valueOf:function(){return this.val}, val:6}]); // Number(9)
var val=5; sum([1,2,{valueOf:function(){return val}}]); // Number(8)
See the problem ? Then consider this is just barely scraping the massive possible permutations...
We don't even know what kind of TYPE the function RETURN until we are done...
Now imagine this same function-code actually being used on different types or even variations of input, both completely literally (in source code) described and dynamically in-program generated 'arrays'..
Thus, if you were to compile function sum
JUST ONCE, then the only way that always returns the spec-defined result for any and all types of input then, obviously, only by performing ALL spec-prescribed main AND sub steps can guarantee spec conforming results (like an unnamed pre-y2k browser).
No optimizations (because no assumptions) and dead slow interpreted scripting language remains.
JIT-Compilation (JIT as in Just In Time) is the current popular solution.
So, you start to compile the function using assumptions regarding what it does, returns and accepts.
you come up with checks as simple as possible to detect if the function might start returning non-spec conformant results (like because it receives unexpected input).
Then, toss away the previous compiled result and recompile to something more elaborate, decide what to do with the partial result you already have (is it valid to be trusted or compute again to be sure), tie in the function back into the program and try again. Ultimately falling back to stepwise script-interpretation as in spec.
All of this takes time!
All browsers work on their engines, for each and every sub-version you will see things improve and regress. Strings were at some point in history really immutable strings (hence array.join was faster than string concatenation), now we use ropes (or similar) which alleviate the problem. Both return spec-conforming results and that is what matters!
Long story short: just because javascript's language's semantics often got our back (like with this silent bug in the OP's example) does not mean that 'stupid' mistakes increases our chances of the compiler spitting out fast machine-code. It assumes we wrote the 'usually' correct instructions: the current mantra we 'users' (of the programming language) must have is: help the compiler, describe what we want, favor common idioms (take hints from asm.js for basic understanding what browsers can try to optimize and why).
Because of this, talking about performance is both important BUT ALSO a mine-field (and because of said mine-field I really want to end with pointing to (and quoting) some relevant material:
Access to nonexistent object properties and out of bounds array elements returns the undefined
value instead of raising an exception. These dynamic features make programming in JavaScript convenient, but they also make it difficult to compile JavaScript into efficient machine code.
...
An important premise for effective JIT optimization is that programmers use dynamic features of JavaScript in a systematic way. For example, JIT compilers exploit the fact that object properties are often added to an object of a given type in a specific order or that out of bounds array accesses occur rarely. JIT compilers exploit these regularity assumptions to generate efficient machine code at runtime. If a code block satisfies the assumptions, the JavaScript engine executes efficient, generated machine code. Otherwise, the engine must fall back to slower code or to interpreting the program.
Source:
"JITProf: Pinpointing JIT-unfriendly JavaScript Code"
Berkeley publication,2014, by Liang Gong, Michael Pradel, Koushik Sen.
http://software-lab.org/publications/jitprof_tr_aug3_2014.pdf
ASM.JS (also doesn't like out off bound array access):
Ahead-Of-Time Compilation
Because asm.js is a strict subset of JavaScript, this specification only defines the validation logic—the execution semantics is simply that of JavaScript. However, validated asm.js is amenable to ahead-of-time (AOT) compilation. Moreover, the code generated by an AOT compiler can be quite efficient, featuring:
- unboxed representations of integers and floating-point numbers;
- absence of runtime type checks;
- absence of garbage collection; and
- efficient heap loads and stores (with implementation strategies varying by platform).
Code that fails to validate must fall back to execution by traditional means, e.g., interpretation and/or just-in-time (JIT) compilation.
http://asmjs.org/spec/latest/
and finally https://blogs.windows.com/msedgedev/2015/05/07/bringing-asm-js-to-chakra-microsoft-edge/
were there is a small subsection about the engine's internal performance improvements when removing bounds-check (whilst just lifting the bounds-check outside the loop already had an improvement of 40%).
EDIT:
note that multiple sources talk about different levels of JIT-Recompilation down to interpretation.
Theoretical example based on above information, regarding the OP's snippet:
- Call to isPrimeDivisible
- Compile isPrimeDivisible using general assumptions (like no out of bounds access)
- Do work
- BAM, suddenly array accesses out of bounds (right at the end).
- Crap, says engine, let's recompile that isPrimeDivisible using different (less) assumptions, and this example engine doesn't try to figure out if it can reuse current partial result, so
- Recompute all work using slower function (hopefully it finishes, otherwise repeat and this time just interpret the code).
- Return result
Hence time then was:
First run (failed at end) + doing all work all over again using slower machine-code for each iteration + the recompilation etc.. clearly takes >2 times longer in this theoretical example!
EDIT 2: (disclaimer: conjecture based in facts below)
The more I think of it, the more I think that this answer might actually explain the more dominant reason for this 'penalty' on erroneous snippet a (or performance-bonus on snippet b, depending on how you think of it), precisely why I'm adament in calling it (snippet a) a programming error:
It's pretty tempting to assume that this.primes
is a 'dense array' pure numerical which was either
- Hard-coded literal in source-code (known excelent candidate to become a 'real' array as everything is already known to the compiler before compile-time) OR
- most likely generated using a numerical function filling a pre-sized (
new Array(/*size value*/)
) in ascending sequential order (another long-time known candidate to become a 'real' array).
We also know that the primes
array's length is cached as prime_count
! (indicating it's intent and fixed size).
We also know that most engines initially pass Arrays as copy-on-modify (when needed) which makes handeling them much more fast (if you don't change them).
It is therefore reasonable to assume that Array primes
is most likely already an optimized array internally which doesn't get changed after creation (simple to know for the compiler if there is no code modifiying the array after creation) and therefore is already (if applicable to the engine) stored in an optimized way, pretty much as if it was a Typed Array
.
As I have tried to make clear with my sum
function example, the argument(s) that get passed higly influence what actually needs to happen and as such how that particular code is being compiled to machine-code. Passing a String
to the sum
function shouldn't change the string but change how the function is JIT-Compiled! Passing an Array to sum
should compile a different (perhaps even additional for this type, or 'shape' as they call it, of object that got passed) version of machine-code.
As it seems slightly bonkus to convert the Typed_Array-like primes
Array on-the-fly to something_else while the compiler knows this function is not even going to modify it!
Under these assumptions that leaves 2 options:
- Compile as number-cruncher assuming no out-of-bounds, run into out-of-bounds problem at the end, recompile and redo work (as outlined in theoretical example in edit 1 above)
- Compiler has already detected (or suspected?) out of bound acces up-front and the function was JIT-Compiled as if the argument passed was a sparse object resulting in slower functional machine-code (as it would have more checks/conversions/coercions etc.). In other words: the function was never eligable for certain optimisations, it was compiled as if it received a 'sparse array'(-like) argument.
I now really wonder which of these 2 it is!