13
double r2 = dx * dx + dy * dy;
double r3 = r2 * sqrt(r2);

Can the second line be replaced by something faster? Something that does not involve sqrt?

fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • 1
    Your code seems to contradict your caption: Where in your code do you actually have r^2? You only have r2, which holds something very different than r squared... or I guess I misunderstood, "^" should not mean "to the power of"? – codeling Dec 09 '11 at 13:21
  • 6
    @nyarlathotep: Assume `r` is `sqrt(dx * dx + dy * dy)`. – GManNickG Dec 09 '11 at 13:22
  • 3
    @nyarlathotep you missed the point. If r = (dx*dx + dy*dy)^(1/2), than r2 = r^2. But that's not the issue. – Luchian Grigore Dec 09 '11 at 13:23
  • I was just confused, ^ is the usual notation for "to the power of" as far as I know. – codeling Dec 09 '11 at 13:23
  • 6
    I'm going to go out on a limb and say that, by the way the question is posed, technically the answer is NO. The simple reason is that if you're given r^2, you don't know the sign of r, so how can you calculate r^3? I think what you're really asking is, given the squared norm of a vector, can you find the cubed norm efficiently? – dantswain Dec 09 '11 at 13:26
  • 3
    Depending on precision you need and dx/dy ratio, you can try Taylor series ( 1+x )^( 1/2 ) = 1 + ( 1/2 )*x - ( 1/8 )*x^2 + ... where x = ( dx/dy )^2 <= 1. – lapk Dec 09 '11 at 13:29
  • 2
    Depending on your desired precision, and the relative sizes of `dx` and `dy`, you could do a Taylor expansion: for example if you know `dy` is small compared to `dx`, then you can approximate `r3` as `dx(dx^2 + 3/2 * dy^2)` (I think I have that right). EDIT: Wow, that's uncanny, AzzA! – James Dec 09 '11 at 13:31
  • 3
    How fast is `sqrt`, just so that people know what they're trying to beat? Does your compiler use SSE (or perhaps some equivalent on other achitectures)? – Steve Jessop Dec 09 '11 at 13:40
  • 5
    @SteveJessop: 3 answers so far, and not a single bench / disassembly. I don't think people are really interested in finding a faster answer, they just throw anything they can think of to the mob... – Matthieu M. Dec 09 '11 at 14:15
  • @Matthieu: I don't mind that so much - without knowing Fred's setup it's not possible to say what will be faster, and a benchmark on the answerer's setup proves nothing. I think it's fair to suggest things that plausibly might be faster so that he can test them, but if what we're trying to beat is `sqrtss` followed by `mul` then not much is plausible anyway. Part of the thrust of my question though is that the "fix" might just be to use a `-m` compiler option. – Steve Jessop Dec 09 '11 at 15:06
  • Oh, and of course the question would have made a lot more sense if the answer was "yes, of course there's a better way, your code is needlessly going around the houses, here's the simple thing you've missed". Complaints about "why aren't you profiling" would be absurd if Fred was Bogosorting 100 items and asking if anyone knew a better sort algorithm -- we don't need to profile to know that Bogosort is wrong. But since Fred's code is at or close to optimal, and the sqrt or something similar is unavoidable, it looks like a micro-optimization question. – Steve Jessop Dec 09 '11 at 15:25

3 Answers3

14

How about

double r3 = pow(r2,1.5);

If sqrt is implemented as a special case of pow, that will save you a multiplication. Not much in the grand scheme of things mind!

If you are really looking for greater efficiency, consider whether you really need r^3. If, for example, you are only testing it (or something derived from it) to see whether it exceeds a certain threshold, then test r2 instead e.g.

const double r3_threshold = 9;

//don't do this
if (r3 > r3_threshold)
    ....

//do do this
const double r2_threshold = pow(r3_threshold,2./3.); 
if (r2 > r2_threshold)
    ....

That way pow will be called only once, maybe even at compile time.

EDIT If you do need to recompute the threshold each time, I think the answer concerning Q_rsqrt is worth a look and probably deserves to outrank this one

Sideshow Bob
  • 4,566
  • 5
  • 42
  • 79
  • What about `r2*r2*r2 > r3_thresh*r3_thresh`? – dantswain Dec 09 '11 at 13:31
  • @dantswain Well that avoids `pow` but it takes extra multiplications and the code is less clear – Sideshow Bob Dec 09 '11 at 13:36
  • Wouldn't a couple (three) extra mults be faster than pow? I think readability is subjective here... I routinely do the squared case as `dx*dx + dy*dy < r_thresh*r_thresh`; the cubed version wouldn't throw me. Besides, if readability was really an issue you could hide it in a macro or inline. *shrug* – dantswain Dec 09 '11 at 13:42
  • 6
    Unless your math library is really crappy, sqrt() will not be implemented via pow(x, .5), and it will be a lot faster than pow() (IIRC I tested this at some point and with the libm in glibc sqrt() was about an order of magnitude faster). But yeah, I suppose it doesn't hurt trying it out.. – janneb Dec 09 '11 at 13:47
  • @dantswain I think Sideshow Bob's point is about repeated checking against the same threshold, in which case a single pow would be better than many many muls (though it depends on the number of iterations). But of course for a one-time check your mul-version is better. I also think readability is of secondary concern, considering the micro-optimization nature of this problem. – Christian Rau Dec 09 '11 at 13:49
  • @dantswain not if the `pow` is only computed at compile time, or once on program start up – Sideshow Bob Dec 09 '11 at 13:59
  • Fair enough :) In my case the thresh is usually changing between iterations and is set by the user, so I leave it in user-friendly units until the check. – dantswain Dec 09 '11 at 14:13
  • @janneb gcc -S -O tells me that some compilers (gcc 4.5.1 in my case) will actually optimize pow(r2,1.5) to r2 * sqrt(r2). – wolfgang Dec 12 '11 at 12:18
12

Use fast inverse sqrt (take the Q_rsqrt function).

You have:

float r2;
// ... r2 gets a value
float invsqrt = Q_rsqrt(r2);
float r3 = r2*r2*invsqrt; // x*x/sqrt(x) = x*sqrt(x)

NOTE: For double types there is a constant like 0x5f3759df which can help you write a function that handles also double data types.

LATER EDIT: Seems like the method has been already discussed here.

LATER EDIT2: The constant for double was in the wikipedia link:

Lomont pointed out that the "magic number" for 64 bit IEEE754 size type double is 0x5fe6ec85e7de30da, but in fact it is close to 0x5fe6eb50c7aa19f9.

Community
  • 1
  • 1
INS
  • 10,594
  • 7
  • 58
  • 89
  • @SideshowBob Since it is used in Quake3 code I believe it to be the fastest – INS Dec 09 '11 at 13:41
  • 10
    Times have changed since Quake 3 was written. For example SSE has a hardware sqrt instruction, which is faster. – Steve Jessop Dec 09 '11 at 13:43
  • 3
    [See here](http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/) for more information. – GManNickG Dec 09 '11 at 13:47
  • 4
    @IulianŞerbănoiu: it's nowhere near the fastest. It is faster than the square root instruction on *some* platforms, but most mainstream CPUs can compute a more accurate reciprocal square root estimate in a single instruction requiring just a couple of cycles latency (`rsqrtss` on Intel, `frsqrte` on PPC, `vrsqrte` on ARM). – Stephen Canon Dec 09 '11 at 13:48
  • @SteveJessop I totally agree, but the OP didn't mention anything about HW instructions, which are clearly faster that these bit tricks. Unfortunately I don't know if the standard C library uses the SSE functions. – INS Dec 09 '11 at 13:48
  • The question really reveals some very interesting facts. – INS Dec 09 '11 at 13:51
  • 4
    The idea of `r^2*r^2/sqrt(r^2)` is still valid, whether you use the Quake trick or a hardware `1/sqrt` instruction. – MSalters Dec 09 '11 at 15:00
  • @drhirsch: and that code has the especially desirable property of *irreducible unreadability*: the only way to make the code readable is to replace it with a completely different implementation, because the algorithm used is pretty much incomprehensible. – Steve Jessop Dec 09 '11 at 15:18
  • IMHO if you had to limit a processor to 4 FP math operations I'd pick + - * and reciprocal square root. – phkahler Dec 09 '11 at 15:52
1

I think another way to look at your question would be "how to calculate (or approximate) sqrt(n)". From there your question would be trivial (n * sqrt(n)). Of course, you'd have to define how much error you could live with. Wikipedia gives you many options:

http://en.wikipedia.org/wiki/Methods_of_computing_square_roots

Pedery
  • 3,632
  • 1
  • 27
  • 39