40

Just to see how it performs, I wrote a very short asm.js module by hand, which simulates the 2D wave equation using 32-bit integer math and typed arrays (Int32Array). I have three versions of it, all as similar as possible:

  1. Ordinary (i.e. legible, albeit C-style) JavaScript
  2. Same as 1, with asm.js annotations added so that it passes the validator, according to Firefox and other tools
  3. Same as 2, except with no "use asm"; directive at the top

I left a demo at http://jsfiddle.net/jtiscione/xj0x0qk3/ which lets you switch between modules to see the effects of using each one. All three work, but at different speeds. This is the hotspot (with asm.js annotations):

for (i = 0; ~~i < ~~h; i = (1 + i)|0) {
    for (j = 0; ~~j < ~~w; j = (1 + j)|0) {
        if (~~i == 0) {
            index = (1 + index) | 0;
            continue;
        }
        if (~~(i + 1) == ~~h) {
            index = (1 + index) | 0;
            continue;
        }
        if (~~j == 0) {
            index = (1 + index) | 0;
            continue;
        }
        if (~~(j + 1) == ~~w) {
            index = (1 + index) | 0;
            continue;
        }
        uCen = signedHeap  [((u0_offset + index) << 2) >> 2] | 0;
        uNorth = signedHeap[((u0_offset + index - w) << 2) >> 2] | 0;
        uSouth = signedHeap[((u0_offset + index + w) << 2) >> 2] | 0;
        uWest = signedHeap [((u0_offset + index - 1) << 2) >> 2] | 0;
        uEast = signedHeap [((u0_offset + index + 1) << 2) >> 2] | 0;
        uxx = (((uWest + uEast) >> 1) - uCen) | 0;
        uyy = (((uNorth + uSouth) >> 1) - uCen) | 0;
        vel = signedHeap[((vel_offset + index) << 2) >> 2] | 0;
        vel = vel + (uxx >> 1) | 0;
        vel = applyCap(vel) | 0;
        vel = vel + (uyy >> 1) | 0;
        vel = applyCap(vel) | 0;
        force = signedHeap[((force_offset + index) << 2) >> 2] | 0;
        signedHeap[((u1_offset + index) << 2) >> 2] = applyCap(((applyCap((uCen + vel) | 0) | 0) + force) | 0) | 0;
        force = force - (force >> forceDampingBitShift) | 0;
        signedHeap[((force_offset + index) << 2) >> 2] = force;
        vel = vel - (vel >> velocityDampingBitShift) | 0;
        signedHeap[((vel_offset + index) << 2) >> 2] = vel;
        index = (index + 1)|0;
    }
}

The "ordinary JavaScript" version is structured as above, but without the bitwise operators that asm.js requires (e.g. "x|0", "~~x", "arr[(x<<2)>>2]", etc.)

These are the results for all three modules on my machine, using Firefox (Developer Edition v. 41) and Chrome (version 44), in milliseconds per iteration:

  • FIREFOX (version 41): 20 ms, 35 ms, 60 ms.
  • CHROME (version 44): 25 ms, 150 ms, 75 ms.

So ordinary JavaScript wins in both browsers. The presence of asm.js-required annotations deteriorates performance by a factor of 3 in both. Furthermore, the presence of the "use asm"; directive has an obvious effect- it helps Firefox a bit, and brings Chrome to its knees!

It seems strange that merely adding bitwise operators should introduce a threefold performance degradation that can't be overcome by telling the browser to use asm.js. Also, why does telling the browser to use asm.js only help marginally in Firefox, and completely backfire in Chrome?

Mousey
  • 1,855
  • 19
  • 34
jtiscione
  • 1,131
  • 8
  • 13
  • For a start, I ran the ["Massive" benchmark](https://kripken.github.io/Massive/) in Chrome 44 and FF 39 (Win XP, 32bit), here are my results for [Chrome](http://pastebin.com/fZQYzWKs) and [Firefox](http://pastebin.com/brtZHecb) (copy & dump into the "enter data copied from another run" field on the benchmark page - yes, it works with the actual HTML). Except for one point ("poppler-cold-preparation"), Chrome was slower everywhere, in the most extreme case 24.6 times slower than FF. Looks like Chrome is currently just not able to handle asm.js reasonably. – Siguza Aug 05 '15 at 12:33
  • 2
    just an idea, did you "benchmark" subsequent/repeated calls, since asm will use more time during compile/opt phases (I suppose) ? – birdspider Aug 05 '15 at 12:50
  • @birdspider You mean running the benchmark multiple times? No, I just took what was there... the current interface seems to require a page reload to run the benchmark again, most likely requiring the code to be compiled/optimised again. But the whole benchmark took about 15 min to complete for me, so I think compilation time is not much of a factor. If Chrome is really taking so long to compile, it baffles me that the code even gets to run *at all*. – Siguza Aug 05 '15 at 13:04
  • i think its simply not implemented (http://dev.modern.ie/platform/status/asmjs/) asm.js; googleing the chrome forums it's in some sort of beta and goes by name of `turbofan` - also (https://www.phoronix.com/scan.php?page=news_item&px=Google-TurboFan-V8-JavaScript) – birdspider Aug 05 '15 at 13:20
  • 2
    @birdspider Then why is there a huge difference when `'use asm'` is added/removed? It just doesn't add up... – Siguza Aug 05 '15 at 13:22

2 Answers2

9

Actually asm.js has not been created to write code by hand but only as result of a compilation from other languages. As far as I know there are no tools that validate the asm.js code. Have you tried to write the code in C lang and use Emscripten to generate the asm.js code? I strongly suspect that the result would be quite different and optimized for asm.js.

I think that mixing typed and untyped vars you only add complexity without any benefits. On the contrary the "asm.js" code is more complex: I tried to parse the asm.js and the plain functions on jointjs.com/demos/javascript-ast and the results are:

  • the plain js function has 137 nodes and 746 tokens
  • the asm.js function has 235 nodes and 1252 tokens

I would say that if you have more instructions to execute in each loop it easily will be slower.

cristian v
  • 1,022
  • 6
  • 8
  • 3
    Although you're right about it not being designed to be written by hand, [here](http://turtlescript.github.cscott.net/asmjs.html) seems to be an asm.js validator, and OP's `AsmWaveModule` passes the check. – Siguza Aug 05 '15 at 14:17
  • Also, Firefox prints this in the console: `Successfully compiled asm.js code (total compilation time 1ms; not stored in cache (too small to benefit))`. When I remove a `~~` from OP's jsfiddle, that changes to `TypeError: asm.js type error: Disabled by debugger`. So it would seem the asm.js itself isn't faulty. – Siguza Aug 05 '15 at 14:24
  • I agree that having 50% more nodes / tokens should slow it down, but this is a surprising hit. Annotations (with "no asm"; included) are introducing a 3X slowdown on Firefox and Chrome.I tried it on Safari (with no asm.js support), all 3 versions were really slow (not surprising) but the annotations incur only a 50% slowdown on Safari. – jtiscione Aug 05 '15 at 16:24
  • 2
    I just tried it on IE (also with no asm.js support). The ordinary version runs fine, but with or without the directive, the mere presence of annotations makes IE *crash*. That has nothing to do with asm.js, but it's really weird. – jtiscione Aug 05 '15 at 16:37
  • 1
    @jtiscione Nah, I bet this is expected behaviour in IE. Jokes aside, it would make sense for the pseudo-asm version to be slower, simply because there are more instructions to run, but the mere presence of `'use asm';` slowing everything down by a factor 3 can **not** be reduced to "it wasn't designed to be written by hand". – Siguza Aug 07 '15 at 14:00
  • I tried to profile the code using FF. The results for the Iterate function in this way are: NoAsm: 1.532 Asm: 0.807 AsmNoDir: 1.196 These make more sense to me but I looked at your code and I think that your way to measure the performances is right. So honestly I am quite confused now – cristian v Aug 08 '15 at 09:55
1

There is some fix cost to switching asm.js contexts. Ideally you do it once and run all of your code within your app as asm.js. Then you can control memory management using typed arrays and avoid lots of garbage collections. I'd suggest to rewrite the profiler and measure asm.js within asm.js - without context switching.

Torsten Becker
  • 4,330
  • 2
  • 21
  • 22
  • But in this case there is no garbage to be collected, since all the operations are being done on a shared ArrayBuffer heap instantiated at startup and accessed via typed array views (Int32Array and Uint32Array). AFAIK there is no way to house an entire app within asm.js. You will always need outside code to instantiate the compiled module and invoke its entry points. – jtiscione Aug 09 '15 at 19:41
  • [I tried to measure the time from inside `iterate` as well](https://jsfiddle.net/8xk7m6gr/5/), but it doesn't seem to make a difference... or does using `stdlib.performance.now()` invoke yet another context switch? If so, is it even possible to measure the actual time spent in an asm.js function? – Siguza Aug 10 '15 at 07:28
  • If you go through the JS source where it always initializes "totalCycles" to 4, and raise that to 8 or 12, it will loop 2X and 3X as many times through that hotspot code above. On Chrome, the slowdown went from 22 to 40 to 65 ms for ordinary JS, 140 to 275 to 400 ms for asm.js, and 70 to 140 to 210 ms for asm.js with no directive. Since the time it takes to run is almost directly proportional to the time it spends in that code above, I think the overhead from context switching looks pretty small, at least in this case. – jtiscione Aug 14 '15 at 21:21