5

Through code profiling, I have found the Math.sqrt function specifically to be a major bottleneck in a large doubly-nested loop that runs every timestep in my program. Is there any way to improve its performance? Should I inline some kind of iterative calculation, or lookup table-based calculation?

Any help would be greatly appreciated!

I cannot replace it with a squares calculation instead, since it is not a comparison.

EDIT: The relevant portion of the code looks roughly as follows

var width = 2000;
var height = 2000;

function update() {
    for (var j = 0; j < height; ++j) {
        for (var i = 0; i < width; ++i) {
            array[i][j] = Math.sqrt(/* some expression involving i and j */);
        }
    }
}

var fps = 60;
setInterval(update, 1000 / fps);
user76284
  • 1,269
  • 14
  • 29
  • 2
    I love this question – Joe Thomas May 24 '16 at 22:41
  • 3
    Square root takes longer than simple operations. What exactly does the code look like? Without seeing exactly what you're doing, it's unlikely that anybody can help. You can't really do much about how the `Math` functions operate. – Pointy May 24 '16 at 22:41
  • So are you asking for algorithms for calculating square roots? What level of accuracy is required? What range of values must square roots be calculated for? – RobG May 24 '16 at 22:43
  • 1
    I'm not convinced, Javascript uses double precision floats for storing non integer numbers, and Math.sqrt uses the FPU instruction https://en.wikipedia.org/wiki/Double-precision_floating-point_format which takes the equivalent of maybe 4 to 10 multiplication @Pointy [how-slow-how-many-cycles-is-calculating-a-square-root](http://stackoverflow.com/questions/7724061/how-slow-how-many-cycles-is-calculating-a-square-root) – reuns May 24 '16 at 22:44
  • 1
    @user1952009 JavaScript uses double-precision floats for storing **all** numbers. I would say that taking twice or four times as much time as multiplication would count as "taking longer", wouldn't you? And you still haven't posted any code. – Pointy May 24 '16 at 22:46
  • @Pointy : ??? did I say something bad ? I sent it to you because you talked of the number of cycles it takes, which is detailed in the link I gave. and are you sure for the integers ? – reuns May 24 '16 at 22:47
  • @user1952009 The link you pointed to states that a square root function is anywhere from 4 times to 13 times slower than addition. That means it's slower than simple operations. – Mike Cluck May 24 '16 at 22:49
  • @Pointy : yes you are right for the integers that are stored as double, tks. I tested a = Math.floor(Math.pow(2,31)); b = a * a; c = b+1; we get c==b, hence it doesn't use int64, and it doesn't use int32 because a = Math.floor(Math.pow(2,22)); b = a * a; c = b+1; we get b != c. – reuns May 24 '16 at 22:52
  • @user1952009 yes I'm painfully aware of the fact that numbers in JavaScript are always doubles :) There are internal computations done as integers, but numbers "at rest" are doubles. Anyway this question is hard to deal with if we don't have code. – Pointy May 24 '16 at 22:56
  • @Pointy : I thought stupidly that for example V8 would optimize such basic things as storing integers as integers... – reuns May 24 '16 at 22:57
  • @Pointy : and I just remembered, there is the (very special) UInt8Array which doesn't store integers as double ! http://stackoverflow.com/a/33578970/1952009 – reuns May 24 '16 at 23:00
  • @user1952009 V8 does actually optimize basic integer operations, like comparisons! I got a lot of hate for asking a question about it. http://stackoverflow.com/questions/37336371/javascript-comparing-floats-vs-integers-i-dont-like-you-guys-anymore – Joe Thomas May 24 '16 at 23:14
  • @wateriswet : so how do you explain what I obtained with my test Math.floor(Math.pow(2,k))+1; ?? maybe someone should look at V8 source code – reuns May 24 '16 at 23:16
  • @user1952009 I can't. I have not combed through the v8 source. I just know from my benchmarks that V8 seems to be doing some sort of optimization with integers. Yes, someone should. – Joe Thomas May 24 '16 at 23:19
  • Yes getting a square root is hard. However if you are using only certain whole numbers. Say you are using math.random 0-100. Then you can probably write a much faster array assignment table for your used values. – David May 24 '16 at 23:20
  • I have tested both a hand rolled Newton Method with reduced precision and the infamous quake inverse and neither can match the performance of a native square root. Interestingly enough, the quake inverse in javascript does best an inverse of the native square root on chrome. However (y * quake(y)) is still slower than Math.sqrt(y) in my testing – JonSG May 24 '16 at 23:29
  • If anyone would like to see the Quake method timed against the native sqrt() let me know and I will post the code. As it is slower, it is not really an "answer" so I have held off posting it – JonSG May 25 '16 at 13:14
  • 1
    What is the /* some expression involving i and j */ exactly ? – Woody Sep 09 '16 at 16:20
  • If your argument of sqrt is often the same, you might consider using a lookup table. – kvantour Dec 04 '18 at 16:32

2 Answers2

2

Without further details, the question can only be answered in a general sense, I'm afraid.

Do Not Re-Invent the Wheel

So should You, generally speaking, try to re-implement Math.sqrt using lookup tables and whatnot? I would strongly advise against that. You would be hard pressed to even come close to the default JSVM implementation. It's not even a fair fight: The JavaScript Virtual Machine (JSVM) has access to native code and even hardware instructions that You likely do not.

So what can You do? In the following, You will find some suggestions, in the order in which I would personally try them:

Swapping Loop Variables

In Your given example, the order of loop variables is sub-optimal. The better order would be:

for(let i = 0; i < width ; ++i)
for(let j = 0; j < height; ++j)
  array[i][j] = Math.sqrt(/* some expression involving i and j */);

This should be significantly faster because array is actually an Array of Array references, i.e. every time You call array[i] for a different i, a new Array object has to be looked up in memory. It is much more cache friendly to process the inner Arrays one after another.

Motivate the JSVM to use float32

I have not actually tested this, but You might be able to tell the JSVM to use float32 precision instead of float64. One way to do that might be to use a lot of Math.fround calls, something along the lines of:

for(let i = 0; i < width ; ++i)
for(let j = 0; j < height; ++j)
  array[i][j] = Math.fround( Math.sqrt( Math.fround(/* some expression involving i and j */) ) );

You should always benchmark to see if this gives You any performance gains at all.

Another way to to enforce float32 is to use Float32Array as inner arrays. Not only do Float32Arrays give type information to the JSVM. They also have half the memory size compared to Arrays of Numbers, i.e. it should increase Your throughput.

const array = Array.from({length: width}, () => new Float32Array(height));

for(let i = 0; i < width ; ++i)
for(let j = 0; j < height; ++j)
  array[i][j] = Math.sqrt(/* some expression involving i and j */);

For more information on this topic, see this blog post.

Fast inverse square root

If You are using the square root to normalize vectors for computer graphics, there is a bonkers approximation of the inverse square root that was originally used in the Quake 3 engine. It might be faster than 1/Math.sqrt(y). As always, benchmarking is Your best friend.

Math.hypot()

If You do something along the lines of Math.sqrt(x*x + y*y + ...), You can try to use Math.hypot(x,y,...) instead. It might be a tiny bit faster and, on top of that, it is underflow- and overflow-safe.

Use the WebGL/WebGPU/tensorflow.js

If all else fails You can try to use GPU acceleration to speed up Your computations. The easiest way to to that might be using some machine learning framework like tensorflow.js.

import * as tf from '@tensorflow/tfjs';

tf.setBackend('webgl');

const sqrt = await tf.sqrt(array).array();

console.log({sqrt});

Moving data to and from the GPU comes with some overhead, so this might only give performance benefits if You off-load more than just the sqrt operation to the GPU.

You can of course use WebGL or WebGPU direcly, but it will be a lot more work.

Dirk
  • 121
  • 4
0

As your iterating in a 2-dimensional array you can reduce the iterations by ~2.

For instance :

if your expression is i x j, and your array is 3-3 size starting from 0 your going to compute in your loop :

  • 0*1 AND 1*0
  • 0*2 AND 2*0
  • 1*2 AND 2*1

which is the same

0x0 1x0 2x0

0x1 1x1 2x1

0x2 1x2 2x2

Woody
  • 809
  • 8
  • 21
  • The algorithm would still be O(n^2) so this suggestion won't help too much. With the given example, he has 2000 x 2000, so 4,000,000 iterations. Reducing that by 2,000 would be a drop in the bucket. It would be better to attack the algorithm itself if possible. Doing 4,000,000 Math.sqrt() calls every 16ms is a bit... demanding. – Sam Mar 31 '17 at 21:03
  • Actually the number of iterations would be ((N^2 - N)/2) + N (upper triangle + diagonale) which is pretty equivalent to N^2/2. For instance a matrix of 2000 x 2000 could be reduced to ~2,000,000 iteration – Woody Apr 03 '17 at 17:48
  • Woody, are you not familiar with Big O notation? – Sam Apr 03 '17 at 18:00