1

I built a simple javascript vs. WebAssembly/SIMD benchmark as follows:

var sum = 0;
for (var c=0; c<N; c++)
{
   var v3 = new Vector3();
   sum += vs.dot(e);
}
var sum = 0;
for (var c=0; c<N; c++)
{
   var v3 = new WASM_Vector3();
   sum += vs.dot(e);
}

where WASM_Vector3 is implemented by c++ with SSE4.2 -msimd128 and compiled into wasm.

When N<3000, WASM outperforms Pure-JS. The larger the N is, Pure-JS begins to out-perform WASM. I know this is because of delay between JS/WASM interface. But is there a way to improve the above code to minimize the interface delay?

user2566142
  • 77
  • 1
  • 4
  • Not my specialty, but why don't you write the whole loop in C++? – Marc Glisse Jul 18 '21 at 21:25
  • If you're doing a dot product of two arrays, your accumulator should be a vector. Like `mulps` -> `addps`, and only reduce to a scalar `sum` outside the loop. That goes hand in hand with Marc's suggetsion to write the whole loop in C++ / WASM, so the overhead of passing things into WASM code is only paid once, as well as making it possible to use a vector accumulator. (Or better, [4 or 8 vector accumulators to hide FP latency](https://stackoverflow.com/q/45113527), especially if a JS engine can optimize the SIMD mul and add into an FMA on targets that have it) – Peter Cordes Jul 18 '21 at 21:54
  • Thank you for your input. I will take a look at that link. BTW, Per this article by Mozilla, the js/wasm interface is finally fast, but not so on Chrome 91. Or perhaps V8 is just too fast. https://hacks.mozilla.org/2018/10/calls-between-javascript-and-webassembly-are-finally-fast-%F0%9F%8E%89/ – user2566142 Jul 18 '21 at 22:22
  • You also have to keep in mind that WASM SIMD is a very limited set of instructions, and many of the functions in the SSE headers are emulated using a series of instructions, sometimes using scalar code. A lot of functions also have a slight mismatch which you might not care about but can substantially slow down the code because emscripten has to match SSE's behavior; for example, differences in rounding mode, NaN handling, out-of-range values, etc. Basically, you might want to look at which SSE functions are being called to see if you can speed things up. – nemequ Jul 19 '21 at 00:10
  • Taking into accounts of what you guys said, it seems to me the best way to boost Js software performance via WASM/SIMD is not to bring existing JS semantic into WASM, such as Vector3 above. But to port just the dot product operation into WASM/SIMD. and to use 4 number streams if possible to take full advantage of SIMD, such as the cases of applying same numerical calculation to large arrays of Vectors or Matrixes. – user2566142 Jul 19 '21 at 15:03

0 Answers0