How can I account for round-off errors in floating-point arithmetic for inverse trig (and sqrt) functions (in C)?

Question

I have a fairly complicated function that takes several double values that represent two vectors in 3-space of the form (magnitude, latitude, longitude) where latitude and longitude are in radians, and an angle. The purpose of the function is to rotate the first vector around the second by the angle specified and return the resultant vector. I have already verified that the code is logically correct and works.

The expected purpose of the function is for graphics, so double precision is not necessary; however, on the target platform, trig (and sqrt) functions that take floats (sinf, cosf, atan2f, asinf, acosf and sqrtf specifically) work faster on doubles than on floats (probably because the instruction to calculate such values may actually require a double; if a float is passed, the value must be cast to a double, which requires copying it to an area with more memory -- i.e. overhead). As a result, all of the variables involved in the function are double precision.

Here is the issue: I am trying to optimize my function so that it can be called more times per second. I have therefore replaced the calls to sin, cos, sqrt, et cetera with calls to the floating point versions of those functions, as they result in a 3-4 times speed increase overall. This works for almost all inputs; however, if the input vectors are close to parallel with the standard unit vectors (i, j, or k), round-off errors for the various functions build up enough to cause later calls to sqrtf or inverse trig functions (asinf, acosf, atan2f) to pass arguments that are just barely outside of the domain of those functions.

So, I am left with this dilemma: either I can only call double precision functions and avoid the problem (and end up with a limit of about 1,300,000 vector operations per second), or I can try to come up with something else. Ultimately, I would like a way to sanitize the input to the inverse trig functions to take care of edge cases (it is trivial for do so for sqrt: just use abs). Branching is not an option, as even a single conditional statement adds so much overhead that any performance gains are lost.

So, any ideas?

Edit: someone expressed confusion over my using doubles versus floating point operations. The function is much faster if I actually store all my values in double-size containers (I.E. double-type variables) than if I store them in float-size containers. However, floating point precision trig operations are faster than double precision trig operations for obvious reasons.

There's an entire field of study about "how to account for errors" in approximate algorithms called [numerical analysis](http://en.wikipedia.org/wiki/Numerical_analysis). — Ben Jackson, Nov 13 '10 at 08:30
As it turns out, the speed increase I was seeing was actually *due* to the NaNs, and not due to using float operations instead of double operations. So, this entire question has an incorrect basis. — Talia, Nov 16 '10 at 19:29
if your rotation algorithm use previously rotated vectors so you loose precision after each iteration/cycle/frame or whatever. To account that you should count how many times you have rotated and if the counter hits some treshold value (any number) then correct all vectors from the original ones or by some geometric property (if you know their supposed length, or if some vectors should be perpendicular to each other,...) It will slow down computation once in the while but in general should not affect performance too much (experiment with treshold to achieve compromise between speed/accuracy) — Spektre, Aug 20 '13 at 13:36
It didn't do that, but this is from a very long time ago. Basically, I was trying to speed this up by offloading the computation to a program in C: https://www.youtube.com/watch?v=nThMEgp2MMo — Talia, Aug 22 '13 at 01:22
The fact that you're running into args of acos and asin that are barely out of range is a red flag; it means you've already lost a ton of accuracy. Rule of thumb: if you ever find yourself taking asin or acos of a number whose magnitude is > sqrt(0.5), then go back and rewrite your program so it doesn't happen. I'm guessing you're doing something such as calculating angle between two unit vectors as `acos(a dot b)`, or calculating latitude as asin(z) when you know x,y,z. Don't ever do these things! For alternatives, see e.g. http://www.plunk.org/~hatch/rightway.html , and atan2 is your friend. — Don Hatch, May 13 '22 at 07:15
Thanks for the advice Don. I ended up using rotation matrices in later projects, but to be honest I haven't written any 3D manipulation software in the last 5-6 years. — Talia, May 15 '22 at 22:39

score 4 · Accepted Answer · answered Nov 13 '10 at 07:51

4

Basically, you need to find a numerically stable algorithm that solves your problem. There are no generic solutions to this kind of thing, it needs to be done for your specific case using concepts such as the condition number if the individual steps. And it may in fact be impossible if the underlying problem is itself ill-conditioned.

answered Nov 13 '10 at 07:51

Michael Borgwardt

342,105
78
482
720

The correct solution for me ended up being one of this type. I am now using rotation matrices, which have the properties that you mention, and are much much faster. – Talia Dec 30 '10 at 05:35

Crashworks · Answer 2 · 2010-11-13T08:21:15.890

Single precision floating point inherently introduces error. So, you need to build your math so that all comparisons have a certain degree of "slop" by using an epsilon factor, and you need to sanitize inputs to functions with limited domains.

The former is easy enough when branching, eg

bool IsAlmostEqual( float a, float b ) { return fabs(a-b) < 0.001f; } // or
bool IsAlmostEqual( float a, float b ) { return fabs(a-b) < (a * 0.0001f); } // for relative error

but that's messy. Clamping domain inputs is a little trickier, but better. The key is to use conditional move operators, which in general do something like

float ExampleOfConditionalMoveIntrinsic( float comparand, float a, float b ) 
{ return comparand >= 0.0f ? a : b ; }

in a single op, without incurring a branch.

These vary depending on architecture. On the x87 floating point unit you can do it with the FCMOV conditional-move op, but that is clumsy because it depends on condition flags being set previously, so it's slow. Also, there isn't a consistent compiler intrinsic for cmov. This is one of the reasons why we avoid x87 floating point in favor of SSE2 scalar math where possible.

Conditional move is much better supported in SSE by pairing a comparison operator with a bitwise AND. This is preferable even for scalar math:

// assuming you've already used _mm_load_ss to load your floats onto registers 
__m128 fsel( __m128 comparand, __m128 a, __m128 b ) 
{
    __m128 zero = {0,0,0,0};
    // set low word of mask to all 1s if comparand > 0
    __m128 mask = _mm_cmpgt_ss( comparand, zero );  
    a = _mm_and_ss( a, mask );    // a = a & mask 
    b = _mm_andnot_ss( mask, b ); // b = ~mask & b
    return _mm_or_ss( a, b );     // return a | b
    }
}

Compilers are better, but not great, about emitting this sort of pattern for ternaries when SSE2 scalar math is enabled. You can do that with the compiler flag /arch:sse2 on MSVC or -mfpmath=sse on GCC.

On the PowerPC and many other RISC architectures, fsel() is a hardware opcode and thus usually a compiler intrinsic as well.

Pulling epsilons out of your @$$ does not result in reliable floating point code. It would be much better to just clamp values to the domain of the functions they're going to be passed to, instead of performing dubious "almost-equal" tests. — R.. GitHub STOP HELPING ICE, Nov 13 '10 at 08:18
@R.. If you read carefully you'll notice that's what I'm saying. I blow off the almost-equal as being branchy, and then go on to show how to condition domain inputs via conditional moves. I have edited the original to make this clear. — Crashworks, Nov 13 '10 at 08:20
I cannot seem to get this to work on my platform (gcc and Linux). I took a look at the assembly being generated, and it's still creating branches (conditional jumps). I see no conditional moves in the assembly anywhere. — Talia, Nov 16 '10 at 04:16
Which code are you trying to get the compiler to turn into a cmov? — Crashworks, Nov 16 '10 at 04:33
Basically min/max code. But it doesn't matter now, because I realize that my entire question was based off of an incorrect interpretation of the times I was seeing when testing my code. As it turns out, sinf, cosf, et cetera are much faster if their input is NaN, but otherwise are not much faster than sin, cos, et cetera. — Talia, Nov 16 '10 at 19:27
In general you have to use intrinsics to force conditional moves when you want them. Compilers are terrible at issuing them automatically. — Crashworks, Nov 16 '10 at 20:33

score 1 · Answer 3 · answered Nov 13 '10 at 06:11

1

Have you looked at the Graphics Programming Black Book or perhaps handing the calculations off to your GPU?

answered Nov 13 '10 at 06:11

Mitch Wheat

295,962
43
465
541

The purpose of the exercise is (as a personal goal) to implement everything from scratch using only the CPU (I am allowing myself to use math.h). Thus, while this would be a plausible solution for most practical purposes, it is not adequate for mine. – Talia Nov 13 '10 at 06:35

How can I account for round-off errors in floating-point arithmetic for inverse trig (and sqrt) functions (in C)?

3 Answers3

Linked