Cosine in floating point

Question

I am trying to implement the cosine and sine functions in floating point (but I have no floating point hardware).

Since my processor has no floating-point hardware, nor instructions, I have already implemented algorithms for floating point multiplication, division, addition, subtraction, and square root. So those are the tools I have available to me to implement cosine and sine.

I was considering using the CORDIC method, at this site However, I implemented division and square root using newton's method, so I was hoping to use the most efficient method.

Please don't tell me to just go look in a book or that "paper's exist", no kidding they exist. I am looking for names of well known algorithms that are known to be fast and efficient.

Can't you find and adapt an existing math library? Because the math below them are quite complex! Writing a competitive math library might worth you a PhD (and would need years of efforts). — Basile Starynkevitch, Feb 13 '12 at 18:33
I'm not really a code junkie, I would just like the algorithm and I can do the implementation myself. I have to rewrite the code in assembly myself and hand schedule it as well. (It is for a custom processor). — Veridian, Feb 13 '12 at 18:34
I believe there are lots of books and papers on that subject. Did you go into a university library? — Basile Starynkevitch, Feb 13 '12 at 18:38
Depending on the hardware spec/the precision required, you may find it easier to pre-compute a lookup table than calculate accurately - it depends on your usage case. I believe older scientific calculators used this approach. — Basic, Feb 13 '12 at 18:51
@Basic, I am planning on following Stephen Canon's advice, do you argue that the pre-computed lookup table is faster? I need precision out to 23 bits (for IEEE-754 floating point). — Veridian, Feb 13 '12 at 19:24
Which algorithm will be fastest depends on your hardware. You'll have to try out different ones to see which is the fastest. — Gabe, Feb 13 '12 at 20:14
@Gabe, I mean fastest by least number of expensive operations. For instance, bit shift is preferred over having to divide. — Veridian, Feb 13 '12 at 20:17
@starbox I posted that ~2 minutes before Steven's answer - I have no doubt his is a faster solution than mine as I've never had to implement it myself (hence a comment not an answer :) — Basic, Feb 14 '12 at 06:45
@Basic: if you only need limited accuracy and don't care about extremely large inputs (which is often the case), a lookup table is a great approach. If you want to deliver a fully-accurate floating-point result over the entire range, it's prohibitively expensive in memory usage. — Stephen Canon, Feb 14 '12 at 12:34
@StephenCanon Yeah, when OP mentioned 23 bits of precision, it became clear that a lookup table would be huge. That said, you've taught me something with the Argument Reduction... paper - Thanks. — Basic, Feb 14 '12 at 12:36

Stephen Canon · Accepted Answer · 2012-02-13T18:55:23.493

4

First off, depending on your accuracy requirements, this can be considerably fussier than your earlier questions.

Now that you've been warned: you'll first want to reduce the argument modulo pi/2 (or 2pi, or pi, or pi/4) to get the input into a manageable range. This is the subtle part. For a nice discussion of the issues involved, download a copy of K.C. Ng's ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit. (simple google search on the title will get you a pdf). It's very readable, and does a great job of describing why this is tricky.

After doing that, you only need to approximate the functions on a small range around zero, which is easily done via a polynomial approximation. A taylor series will work, though it is inefficient. A truncated chebyshev series is easy to compute and reasonably efficient; computing the minimax approximation is better still. This is the easy part.

I have implemented sine and cosine exactly as described, entirely in integer, in the past (sorry, no public sources). Using hand-tuned assembly, results in the neighborhood of 100 cycles are entirely reasonable on "typical" processors. I don't know what hardware you're dealing with (the performance will mostly be gated on how quickly your hardware can produce the high part of an integer multiply).

edited Feb 13 '12 at 18:55

answered Feb 13 '12 at 18:49

Stephen Canon

103,815
19
183
269

@starbox: minimax approximation is something of a black art. you only really know about it if you need to use it. You can start by looking up the "Remez exchange algorithm", which is a standard approach. A truncated Chebyshev series is much easier to compute, and nearly as good. A taylor series will require a few more terms to deliver the same accuracy, but is (obviously) very simple. – Stephen Canon Feb 13 '12 at 18:58
@starbox: I should also note that you probably *won't* want to use your existing FP addition and multiplication routines (that's too expensive). Keeping most of the computation in integer turns out to be easier in some ways, anyway. The processor that I wrote the integer implementation for actually had an FPU, but I was able to get the result faster without using it. – Stephen Canon Feb 13 '12 at 19:01
1

@starbox: the reason to avoid using full FP operations is that you don't need to round every step of the computation. You can carry a little extra precision and delay any rounding logic until the very end. – Stephen Canon Feb 13 '12 at 19:28
When I implemented this code in MATLAB, I am getting large errors from the range reduction, is this to be expected? As the incoming theta's grow larger and range reduced to between +/- pi/4, the error grows. – Veridian Feb 23 '12 at 18:00
1

@starbox: a correct range reduction will not introduce large errors no matter how big the input is. As I noted, however, it is quite difficult to implement a correct range reduction (it's one of the fussiest parts of a floating-point library -- maybe the fussiest). – Stephen Canon Feb 23 '12 at 18:19
I've kept everything to single-precision floating point. Is that where the issues might be coming in? I'm not quite sure how to adjust the document you told me to read to pertain to 32-bit floating point. – Veridian Feb 23 '12 at 18:47
@starbox: the exact details of a fully precise range reduction are probably outside the scope of what can be addressed in comments. If you want to ask a new question, I'll be happy to take a look and see if I can spot the source of your problems, though I won't have a chance to really look at it until the weekend. – Stephen Canon Feb 23 '12 at 19:01
I posted my question. here is a link to it: http://stackoverflow.com/questions/9423516/precision-problems-with-range-reduction – Veridian Feb 24 '12 at 00:19
I was wondering, have you gotten a chance to take a look at my new post regarding range reduction? – Veridian Feb 26 '12 at 21:22
I wonder what *non-contrived* calculations can be performed more accurately by employing argument reduction with a mathematically-perfect Pi than with the nearest double approximation? Most practical trig calculations I've seen use a double-precision-rounded value for pi or a multiple thereof when computing the arguments to trig functions, and so I would think the optimal accuracy would be achieved by having argument reduction performed on a value of pi that was rounded in the same way as the one used in the argument calculations. To my mind, the fact that `Math.Sin(Math.Pi)` is non-zero... – supercat Jun 03 '14 at 20:16
...should be considered a (tolerable) rounding error, rather than a desirable behavior. Are there practical cases where assuming a "precise" pi yields a better result than would computing sin(x) as sin(Math.Pi-X) when x is in the range pi/2 to pi? – supercat Jun 03 '14 at 20:20

score 1 · Answer 2 · answered Feb 14 '12 at 15:47

For various levels of precision, you can find some good approximations here:

http://www.ganssle.com/approx.htm

With the added advantage that they are deterministic in runtime unlike the various "converging series" options which can vary wildly depending on the input value. This matters if you are doing anything real-time (games, motion control etc.)

score -1 · Answer 3 · answered Feb 13 '12 at 18:44

-1

Since you have the basic arithmetic operations implemented, you may as well implement sine and cosine using their taylor series expansions.

answered Feb 13 '12 at 18:44

ardnew

2,028
20
29

I know I could do that, but I want a method that is the most efficient and fastest. I know some methods use a lookup table, which ones are those and are they much faster? – Veridian Feb 13 '12 at 18:50
what about minimax approximation? – Veridian Feb 13 '12 at 18:54
sorry, I'm not familiar with any minimax algorithms – ardnew Feb 13 '12 at 19:00
See http://ww1.microchip.com/downloads/en/AppNotes/00660.pdf for some details on minimax approximations. – Gabe Feb 13 '12 at 20:15

Cosine in floating point

3 Answers3

Linked