309

I've been poring through .NET disassemblies and the GCC source code, but can't seem to find anywhere the actual implementation of sin() and other math functions... they always seem to be referencing something else.

Can anyone help me find them? I feel like it's unlikely that ALL hardware that C will run on supports trig functions in hardware, so there must be a software algorithm somewhere, right?


I'm aware of several ways that functions can be calculated, and have written my own routines to compute functions using taylor series for fun. I'm curious about how real, production languages do it, since all of my implementations are always several orders of magnitude slower, even though I think my algorithms are pretty clever (obviously they're not).

Rakete1111
  • 47,013
  • 16
  • 123
  • 162
Hank
  • 8,289
  • 12
  • 47
  • 57
  • 3
    Please note that this implementation dependent. You should specify which implementation you are most interested in. – jason Feb 17 '10 at 22:24
  • 4
    I tagged .NET and C because I looked in both places and couldn't figure out either. Although looking at the .NET disassembly it looks like it might be calling into unmanaged C, so as far as I know they have the same implementation. – Hank Feb 17 '10 at 23:12
  • 2
    also see: [What algorithm is used by computers to calculate logarithms?](http://math.stackexchange.com/q/61209) – Amro Oct 01 '13 at 16:20

22 Answers22

280

In GNU libm, the implementation of sin is system-dependent. Therefore you can find the implementation, for each platform, somewhere in the appropriate subdirectory of sysdeps.

One directory includes an implementation in C, contributed by IBM. Since October 2011, this is the code that actually runs when you call sin() on a typical x86-64 Linux system. It is apparently faster than the fsin assembly instruction. Source code: sysdeps/ieee754/dbl-64/s_sin.c, look for __sin (double x).

This code is very complex. No one software algorithm is as fast as possible and also accurate over the whole range of x values, so the library implements several different algorithms, and its first job is to look at x and decide which algorithm to use.

  • When x is very very close to 0, sin(x) == x is the right answer.

  • A bit further out, sin(x) uses the familiar Taylor series. However, this is only accurate near 0, so...

  • When the angle is more than about 7°, a different algorithm is used, computing Taylor-series approximations for both sin(x) and cos(x), then using values from a precomputed table to refine the approximation.

  • When |x| > 2, none of the above algorithms would work, so the code starts by computing some value closer to 0 that can be fed to sin or cos instead.

  • There's yet another branch to deal with x being a NaN or infinity.

This code uses some numerical hacks I've never seen before, though for all I know they might be well-known among floating-point experts. Sometimes a few lines of code would take several paragraphs to explain. For example, these two lines

double t = (x * hpinv + toint);
double xn = t - toint;

are used (sometimes) in reducing x to a value close to 0 that differs from x by a multiple of π/2, specifically xn × π/2. The way this is done without division or branching is rather clever. But there's no comment at all!


Older 32-bit versions of GCC/glibc used the fsin instruction, which is surprisingly inaccurate for some inputs. There's a fascinating blog post illustrating this with just 2 lines of code.

fdlibm's implementation of sin in pure C is much simpler than glibc's and is nicely commented. Source code: fdlibm/s_sin.c and fdlibm/k_sin.c

Jason Orendorff
  • 42,793
  • 6
  • 62
  • 96
  • 44
    To see that this is really the code that runs on x86: compile a program that calls `sin()`; type `gdb a.out`, then `break sin`, then `run`, then `disassemble`. – Jason Orendorff Feb 17 '10 at 23:40
  • 9
    @Henry: don't make the mistake of thinking that is good code though. It's really **terrible**, don't learn to code that way! – Andreas Bonini Feb 21 '10 at 12:44
  • 2
    @Andreas Hmm, you're right, the IBM code does look pretty awful compared to fdlibm. I edited the answer to add links to fdlibm's sine routine. – Jason Orendorff Feb 22 '10 at 11:11
  • @Jason: Yeah the fdlibm implementation is simpler, but it just calls `__kernel_sin`, so it's not really an answer. @Andreas: I wasn't really looking to study the style, I was mostly just curious about how it's done. As I said in my edit, my own implementations always seem way slow. – Hank Feb 23 '10 at 16:32
  • 4
    @Henry: `__kernel_sin` is defined in k_sin.c, though, and it's pure C. Click it again—I botched the URL the first time. – Jason Orendorff Feb 23 '10 at 19:57
  • that fdlibm function is also included in GNU libc for floats, located in glibc/sysdeps/ieee754/flt-32/k_sinf.c – u0b34a0f6ae Oct 24 '11 at 11:23
  • 1
    @JasonOrendorff: Looks like the link to `sysdeps/i386/fpu/s_sin.S` is dead. [This](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/fpu/bits/mathinline.h;h=9554827c89990a3f3c0ffee323c4a7e396e187b6;hb=HEAD#l700) *might* be a suitable replacement? – Adam Rosenfield Sep 19 '14 at 04:33
  • @AdamRosenfield Very interesting. That code is only enabled for very old versions of GCC, leading me to wonder what newer versions do... – Jason Orendorff Oct 09 '14 at 15:35
  • 3
    The linked sysdeps code is particularly interesting because it is correctly rounded. That is, it apparently gives the best-possible answer for all input values, which has only become possible fairly recently. In some cases this can be slow because many extra digits may need to be calculated to ensure correct rounding. In other cases it is extremely quick -- for small enough numbers the answer is just the angle. – Bruce Dawson Oct 16 '14 at 04:27
  • Can I find the source code for sin(x) or pow(x) or similar on my machine? I installed gcc with `apt-get install build-essential` – altroware Mar 28 '15 at 12:29
  • 1
    @altroware: I don't know. But you can definitely do `git clone git://sourceware.org/git/glibc.git` and look in `glibc/sysdeps/ieee754/dbl-64`. – Jason Orendorff Mar 31 '15 at 16:58
  • 2
    Can we modify this answer to make it more self-contained? I.E. less links, more code. – Alexander Ryan Baggett Jan 20 '17 at 23:28
  • I don’t see math.h, trig.h, or sin.h under i386 in the noted link http://sourceware.org/git/?p=glibc.git;a=tree;f=sysdeps/i386;h=5ca0f2be42994c68bab5ac719653f4a3b582c0b3;hb=HEAD. – isomorphismes Jun 12 '19 at 07:56
  • The answer for ieee754 must be “precomputed table lookup”, per [table](http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/asincos.tbl;h=f54fbb5d4c89edebe402509147f27382f98bb5f1;hb=HEAD) and [lookup](http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;h=26799f1909b953cda9b0249cd61d15ccf42b51d4;hb=HEAD). One then wonders how the table was computed and checked. – isomorphismes Jun 12 '19 at 08:00
  • "Several of the algorithms first compute a quick result, then if that's not accurate enough, discard it and fall back on a slower algorithm." How does it know if it's not accurate enough? – Aaron Franke Sep 22 '19 at 21:55
  • @AaronFranke The answer was originally written in 2010 or so, and the code has evolved significantly since. I've removed that sentence from the answer. But to guess at your question, without actually looking back in the history 9+ years: it's easy to imagine there might be trig identities you could use to test an answer's accuracy. – Jason Orendorff Dec 03 '19 at 01:49
80

Functions like sine and cosine are implemented in microcode inside microprocessors. Intel chips, for example, have assembly instructions for these. A C compiler will generate code that calls these assembly instructions. (By contrast, a Java compiler will not. Java evaluates trig functions in software rather than hardware, and so it runs much slower.)

Chips do not use Taylor series to compute trig functions, at least not entirely. First of all they use CORDIC, but they may also use a short Taylor series to polish up the result of CORDIC or for special cases such as computing sine with high relative accuracy for very small angles. For more explanation, see this StackOverflow answer.

Community
  • 1
  • 1
John D. Cook
  • 29,517
  • 10
  • 67
  • 94
  • 12
    transcendental math functions like sine & cosine may be implemented in microcode or as hardware instructions in current 32-bit desktop and server processors. This was not always the case, until the i486(DX) all floating point calculations were done in software ("soft-float") for x86 series without a separate coprocessor. Not all of which (FPUs) included transcendental functions (e.g. Weitek 3167). – mctylr Feb 17 '10 at 23:11
  • 2
    Can you be more specific? How does one "polish up" an approximation using a Taylor series? – Hank Feb 17 '10 at 23:27
  • 4
    As far as "polishing up" an answer, suppose you're computing both sine and cosine. Suppose you know the exact value of both at one point (e.g. from CORDIC) but want the value at a nearby point. Then for a small difference h, you can apply the Taylor approximations f(x + h) = f(x) + h f'(x) or f(x + h) = f(x) + h f'(x) + h^2 f''(x)/2. – John D. Cook Feb 18 '10 at 00:33
  • You have identified a duplicate. – dmckee --- ex-moderator kitten Feb 19 '10 at 01:54
  • 8
    x86/x64 chips have an assembly instruction for calculating sine (fsin) but this instruction is sometimes quite inaccurate and is therefore rarely used anymore. See http://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/ for details. Most other processors do *not* have instructions for sine and cosine because calculating them in software gives more flexibility, and may even be faster. – Bruce Dawson Oct 16 '14 at 04:30
  • 1
    Cordics are never used in software implementations. A cordic can be implemented in hardware in about the same space as a multiplier, so if the intel guys had some space left over they put the cordic in there. However cordic accuracy is not predictable, which is a no-no for certain scientific applications. Intel doesn't tell us what they actually use for fsin, but I'm pretty sure they use cordics. It also takes 144 cycles or so...not terribly fast. – Donald Murray Jan 25 '16 at 04:14
  • not all architectures have sin and cos instruction. Even in x86 SSE/AVX there's no sin/cos instruction and a software sin/cos function must be used and the optimized SSE version is much faster than a hardware sin instruction in x87 – phuclv Mar 22 '16 at 03:49
  • 5
    The cordic stuff inside the intel chips are generally NOT used. First, the accuracy and resolution of the operation is extremely important for many applications. Cordic is notoriously inaccurate when you get to the 7th digit or so, and unpredictable. Secondly, i heard there is a bug in their implementation, which causes even more problems. I took a look at the sin function for linux gcc, and sure enough, it uses chebyshev. the built-in stuff isn't used. Oh, also, the cordic algorithm in the chip is slower than the software solution. – Donald Murray Dec 04 '16 at 03:58
  • @mctylr - Slight correction: Don't forget the 8087 "coprocessor" to the 8086. It was perhaps the first commercial implementation (approximately) of IEEE754. – Rick James Aug 14 '19 at 05:57
  • 8087 doesn't do sin, but tan. You can compute sine from tan, but I'm not sure if SW is faster or not. – Donald Murray Aug 16 '19 at 19:09
67

OK kiddies, time for the pros.... This is one of my biggest complaints with inexperienced software engineers. They come in calculating transcendental functions from scratch (using Taylor's series) as if nobody had ever done these calculations before in their lives. Not true. This is a well defined problem and has been approached thousands of times by very clever software and hardware engineers and has a well defined solution. Basically, most of the transcendental functions use Chebyshev Polynomials to calculate them. As to which polynomials are used depends on the circumstances. First, the bible on this matter is a book called "Computer Approximations" by Hart and Cheney. In that book, you can decide if you have a hardware adder, multiplier, divider, etc, and decide which operations are fastest. e.g. If you had a really fast divider, the fastest way to calculate sine might be P1(x)/P2(x) where P1, P2 are Chebyshev polynomials. Without the fast divider, it might be just P(x), where P has much more terms than P1 or P2....so it'd be slower. So, first step is to determine your hardware and what it can do. Then you choose the appropriate combination of Chebyshev polynomials (is usually of the form cos(ax) = aP(x) for cosine for example, again where P is a Chebyshev polynomial). Then you decide what decimal precision you want. e.g. if you want 7 digits precision, you look that up in the appropriate table in the book I mentioned, and it will give you (for precision = 7.33) a number N = 4 and a polynomial number 3502. N is the order of the polynomial (so it's p4.x^4 + p3.x^3 + p2.x^2 + p1.x + p0), because N=4. Then you look up the actual value of the p4,p3,p2,p1,p0 values in the back of the book under 3502 (they'll be in floating point). Then you implement your algorithm in software in the form: (((p4.x + p3).x + p2).x + p1).x + p0 ....and this is how you'd calculate cosine to 7 decimal places on that hardware.

Note that most hardware implementations of transcendental operations in an FPU usually involve some microcode and operations like this (depends on the hardware). Chebyshev polynomials are used for most transcendentals but not all. e.g. Square root is faster to use a double iteration of Newton raphson method using a lookup table first. Again, that book "Computer Approximations" will tell you that.

If you plan on implmementing these functions, I'd recommend to anyone that they get a copy of that book. It really is the bible for these kinds of algorithms. Note that there are bunches of alternative means for calculating these values like cordics, etc, but these tend to be best for specific algorithms where you only need low precision. To guarantee the precision every time, the chebyshev polynomials are the way to go. Like I said, well defined problem. Has been solved for 50 years now.....and thats how it's done.

Now, that being said, there are techniques whereby the Chebyshev polynomials can be used to get a single precision result with a low degree polynomial (like the example for cosine above). Then, there are other techniques to interpolate between values to increase the accuracy without having to go to a much larger polynomial, such as "Gal's Accurate Tables Method". This latter technique is what the post referring to the ACM literature is referring to. But ultimately, the Chebyshev Polynomials are what are used to get 90% of the way there.

Enjoy.

Donald Murray
  • 1,303
  • 8
  • 4
  • 6
    I couldn't agree more with the first few sentences. Also, it's worth recalling that computing special functions with guaranteed precision is a **hard problem**. The clever people you mention spend most of their life doing this. Also, on a more technical note, min-max polynomials are the sought-after graal, and Chebyshev polynomials are simpler proxies for them. – Alexandre C. Jul 18 '15 at 08:52
  • 211
    -1 for the unprofessional and rambling (and mildly rude) tone, and for the fact that the actual non-redundant *content* of this answer, stripped of the rambling and condescension, basically boils down to "They often use Chebyshev polynomials; see this book for more details, it's really good!" Which, you know, might well be absolutely correct, but it's not really the kind of self-contained *answer* we want here on SO. Condensed down like that, it would've made a decent comment on the question, though. – Ilmari Karonen Jul 18 '15 at 10:44
  • 3
    Back in the early game development years, it was usually done with lookup tables critical need for speed). We typically didn't use the standard lib functions for those things. – topspin Nov 21 '16 at 20:41
  • 4
    I use lookup tables in embedded systems quite often and bittians (instead of radians), but this is for a specialized application (like your games). I think the guy is interested in how the c compiler calculates sin for floating point numbers.... – Donald Murray Dec 04 '16 at 04:00
  • 1
    Ah, 50 years ago. I started playing with such on the Burroughs B220 with McLaren series. Later CDC hardware and then Motorola 68000. Arcsin was messy -- I chose the quotient of two polynomials and developed code to find the optimal coefficients. – Rick James Aug 14 '19 at 05:54
  • @IlmariKaronen this is better than the top 2 answers, and by a lot. #1 is general description and directs to borderline unreadable code instead of actual algorithms, and #2 is pretty much entirely rambling. you're just miffed that guy here doesn't care for being "professional" on the internet, because you walk with a stick along your spine. it would be rude and condescending if he wasn't right but he is. as someone looking for a solution to implement in an embedded environment, everything else here is kiddie bs - even by the standards of someone who only briefly played with numerical methods. – TrisT Jul 15 '23 at 21:24
19

For sin specifically, using Taylor expansion would give you:

sin(x) := x - x^3/3! + x^5/5! - x^7/7! + ... (1)

you would keep adding terms until either the difference between them is lower than an accepted tolerance level or just for a finite amount of steps (faster, but less precise). An example would be something like:

float sin(float x)
{
  float res=0, pow=x, fact=1;
  for(int i=0; i<5; ++i)
  {
    res+=pow/fact;
    pow*=-1*x*x;
    fact*=(2*(i+1))*(2*(i+1)+1);
  }

  return res;
}

Note: (1) works because of the aproximation sin(x)=x for small angles. For bigger angles you need to calculate more and more terms to get acceptable results. You can use a while argument and continue for a certain accuracy:

double sin (double x){
    int i = 1;
    double cur = x;
    double acc = 1;
    double fact= 1;
    double pow = x;
    while (fabs(acc) > .00000001 &&   i < 100){
        fact *= ((2*i)*(2*i+1));
        pow *= -1 * x*x; 
        acc =  pow / fact;
        cur += acc;
        i++;
    }
    return cur;

}
Aznaveh
  • 558
  • 8
  • 27
Blindy
  • 65,249
  • 10
  • 91
  • 131
  • 1
    If you tweak the coefficients a little (and hard code them into a polynomial), you can stop about 2 iterations sooner. – Rick James Aug 14 '19 at 06:01
  • 1
    Might you replace this magic .000…01 with DBL_EPSILON? –  Aug 22 '21 at 18:46
16

Concerning trigonometric function like sin(), cos(),tan() there has been no mention, after 5 years, of an important aspect of high quality trig functions: Range reduction.

An early step in any of these functions is to reduce the angle, in radians, to a range of a 2*π interval. But π is irrational so simple reductions like x = remainder(x, 2*M_PI) introduce error as M_PI, or machine pi, is an approximation of π. So, how to do x = remainder(x, 2*π)?

Early libraries used extended precision or crafted programming to give quality results but still over a limited range of double. When a large value was requested like sin(pow(2,30)), the results were meaningless or 0.0 and maybe with an error flag set to something like TLOSS total loss of precision or PLOSS partial loss of precision.

Good range reduction of large values to an interval like -π to π is a challenging problem that rivals the challenges of the basic trig function, like sin(), itself.

A good report is Argument reduction for huge arguments: Good to the last bit (1992). It covers the issue well: discusses the need and how things were on various platforms (SPARC, PC, HP, 30+ other) and provides a solution algorithm the gives quality results for all double from -DBL_MAX to DBL_MAX.


If the original arguments are in degrees, yet may be of a large value, use fmod() first for improved precision. A good fmod() will introduce no error and so provide excellent range reduction.

// sin(degrees2radians(x))
sin(degrees2radians(fmod(x, 360.0))); // -360.0 < fmod(x,360) < +360.0

Various trig identities and remquo() offer even more improvement. Sample: sind()

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
14

Yes, there are software algorithms for calculating sin too. Basically, calculating these kind of stuff with a digital computer is usually done using numerical methods like approximating the Taylor series representing the function.

Numerical methods can approximate functions to an arbitrary amount of accuracy and since the amount of accuracy you have in a floating number is finite, they suit these tasks pretty well.

Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • 14
    A real implementation probably won't use a Taylor series, since there's more efficient ways. You only need to correctly approximate in the domain [0...pi/2], and there's functions that will deliver a good approximation more efficiently than a Taylor series. – David Thornley Feb 17 '10 at 22:44
  • 2
    @David: I agree. I was careful enough to mention the word "like" in my answer. But Taylor expansion is a simple one to explain the idea behind methods that approximate functions. That said, I have seen software implementations (not sure if they were optimized) that used Taylor series. – Mehrdad Afshari Feb 17 '10 at 22:47
  • 1
    Actually, polynomial approximations are one of the most efficient ways to calculate trigonometric functions. – Jeremy Salwen Mar 17 '11 at 03:24
14

Use Taylor series and try to find relation between terms of the series so you don't calculate things again and again

Here is an example for cosinus:

double cosinus(double x, double prec)
{
    double t, s ;
    int p;
    p = 0;
    s = 1.0;
    t = 1.0;
    while(fabs(t/s) > prec)
    {
        p++;
        t = (-t * x * x) / ((2 * p - 1) * (2 * p));
        s += t;
    }
    return s;
}

using this we can get the new term of the sum using the already used one (we avoid the factorial and x2p)

explanation

phuclv
  • 37,963
  • 15
  • 156
  • 475
Hannoun Yassir
  • 20,583
  • 23
  • 77
  • 112
  • 2
    Did you know you can use the Google Chart API to make formulas like this using TeX? http://code.google.com/apis/chart/docs/gallery/formulas.html – Gab Royer Feb 18 '10 at 03:39
11

It is a complex question. Intel-like CPU of the x86 family have a hardware implementation of the sin() function, but it is part of the x87 FPU and not used anymore in 64-bit mode (where SSE2 registers are used instead). In that mode, a software implementation is used.

There are several such implementations out there. One is in fdlibm and is used in Java. As far as I know, the glibc implementation contains parts of fdlibm, and other parts contributed by IBM.

Software implementations of transcendental functions such as sin() typically use approximations by polynomials, often obtained from Taylor series.

Thomas Pornin
  • 72,986
  • 14
  • 147
  • 189
  • 3
    SSE2 registers are *not* used to calculate sin(), neither in x86 nor in x64 mode and, of course, sin is calculated in hardware regardless of the mode. Hey, it's 2010 we live in :) – Igor Korkhov Feb 18 '10 at 02:12
  • 7
    @Igor: that depends on what math library you're looking at. It turns out that the most optimized math libraries on x86 use SSE software implementations for `sin` and `cos` that are faster than the hardware instructions on the FPU. Simpler, more naive libraries tend to use the `fsin` and `fcos` instructions. – Stephen Canon Feb 18 '10 at 06:33
  • @Stephen Canon: Do those fast libraries have 80 bit precision as FPU registers do? I have a very sneaky suspicion that they favor speed over precision, which of course is reasonable in many scenarios, for example in games. And I do believe that calculating sine with 32 bit precision by using SSE and precomputed intermediate tables might be faster than by using `FSIN` with full precision. I would be very grateful if you tell me the names of those fast libraries, it's interesting to have a look. – Igor Korkhov Feb 18 '10 at 12:31
  • @Igor: on x86 in 64-bit mode, at least on all Unix-like systems I know of, precision is limited to 64 bits, not the 79 bits of the x87 FPU. The software implementation of `sin()` happens to be about twice faster than what `fsin` computes (precisely because it is done with less precision). Note that the x87 is known to have a bit less actual precision than its announced 79 bits. – Thomas Pornin Feb 18 '10 at 14:42
  • @Thomas Pornin: thank you for the information. But does it mean that `long double` contains garbage in the last 15 binary digits or is not supported at all? – Igor Korkhov Feb 18 '10 at 15:48
  • The return type of the C library `sin` function is `double`. Carrying extra precision is allowed by the standard, but will lead to hard-to-diagnose numerical errors, and is not something that software writers should depend on, as that precision will be lost anytime the compiler needs to spill to the stack. Two of the libraries that I had in mind are the Intel math library, distributed with ICC, and the OS X system math library, both of which return a fully-accurate double-precision `sin` substantially faster than the `fsin` instruction. – Stephen Canon Feb 18 '10 at 16:01
  • @Igor Korkhov: On platforms that support `long double` but use software implementations of the trig functions, there will typically be either a software long-double `sinl` or they might use a software argument reduction followed by the hardware `fsin` or `fcos` instruction as a core approximation. – Stephen Canon Feb 18 '10 at 16:03
  • @Igor: On x86 in 64-bit mode, on Unix-like systems, the `long double` type is backed by the x87 80-bit registers; the size (as returned by `sizeof`) is 16 (hence 128 bits, 48 of which being unused). The `sin()` function returns a `double`, not a `long double`. `sinl()` uses and returns `long double`. `sinl()` is rarely used since `sin()` is twice faster, and situations which require the extra precision (which is slight) are not common. – Thomas Pornin Feb 18 '10 at 16:09
  • For many common use cases, a fully accurate double-precision `sin( )` can be more like 5-6x faster than `sinl( )`. – Stephen Canon Feb 18 '10 at 16:26
  • @Stephen Canon, @Thomas Pornin: thank you very much for the clarificaton! – Igor Korkhov Feb 18 '10 at 16:58
  • 1
    Indeed, both 32-bits and 64-bits implementations of sin() in the msvc runtime libraries do *not* use the FSIN instruction. In fact, they give different results, take for instance sin(0.70444454416678126). This will result in 0.64761068800896837 (right withing 0.5*(eps/2) tolerance) in a 32-bit program, and will result in 0.64761068800896848 (wrong) in a 64-bit one. – e.tadeu Jun 02 '10 at 13:23
10

Chebyshev polynomials, as mentioned in another answer, are the polynomials where the largest difference between the function and the polynomial is as small as possible. That is an excellent start.

In some cases, the maximum error is not what you are interested in, but the maximum relative error. For example for the sine function, the error near x = 0 should be much smaller than for larger values; you want a small relative error. So you would calculate the Chebyshev polynomial for sin x / x, and multiply that polynomial by x.

Next you have to figure out how to evaluate the polynomial. You want to evaluate it in such a way that the intermediate values are small and therefore rounding errors are small. Otherwise the rounding errors might become a lot larger than errors in the polynomial. And with functions like the sine function, if you are careless then it may be possible that the result that you calculate for sin x is greater than the result for sin y even when x < y. So careful choice of the calculation order and calculation of upper bounds for the rounding error are needed.

For example, sin x = x - x^3/6 + x^5 / 120 - x^7 / 5040... If you calculate naively sin x = x * (1 - x^2/6 + x^4/120 - x^6/5040...), then that function in parentheses is decreasing, and it will happen that if y is the next larger number to x, then sometimes sin y will be smaller than sin x. Instead, calculate sin x = x - x^3 * (1/6 - x^2 / 120 + x^4/5040...) where this cannot happen.

When calculating Chebyshev polynomials, you usually need to round the coefficients to double precision, for example. But while a Chebyshev polynomial is optimal, the Chebyshev polynomial with coefficients rounded to double precision is not the optimal polynomial with double precision coefficients!

For example for sin (x), where you need coefficients for x, x^3, x^5, x^7 etc. you do the following: Calculate the best approximation of sin x with a polynomial (ax + bx^3 + cx^5 + dx^7) with higher than double precision, then round a to double precision, giving A. The difference between a and A would be quite large. Now calculate the best approximation of (sin x - Ax) with a polynomial (b x^3 + cx^5 + dx^7). You get different coefficients, because they adapt to the difference between a and A. Round b to double precision B. Then approximate (sin x - Ax - Bx^3) with a polynomial cx^5 + dx^7 and so on. You will get a polynomial that is almost as good as the original Chebyshev polynomial, but much better than Chebyshev rounded to double precision.

Next you should take into account the rounding errors in the choice of polynomial. You found a polynomial with minimum error in the polynomial ignoring rounding error, but you want to optimise polynomial plus rounding error. Once you have the Chebyshev polynomial, you can calculate bounds for the rounding error. Say f (x) is your function, P (x) is the polynomial, and E (x) is the rounding error. You don't want to optimise | f (x) - P (x) |, you want to optimise | f (x) - P (x) +/- E (x) |. You will get a slightly different polynomial that tries to keep the polynomial errors down where the rounding error is large, and relaxes the polynomial errors a bit where the rounding error is small.

All this will get you easily rounding errors of at most 0.55 times the last bit, where +,-,*,/ have rounding errors of at most 0.50 times the last bit.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • 2
    This is a nice explanation of how one *may* calculate sin(x) efficiently, but it does not really seem to answer the OP's question, which is specifically about how common C libraries / compilers *do* calculate it. – Ilmari Karonen Jul 18 '15 at 11:10
  • 1
    Chebyshev polynomials minimize the maximal absolute value over an interval, but they do not minimize the largest difference between a target function and the polynomial. Minimax polynomials do that. – Eric Postpischil Jul 13 '19 at 15:51
6

The actual implementation of library functions is up to the specific compiler and/or library provider. Whether it's done in hardware or software, whether it's a Taylor expansion or not, etc., will vary.

I realize that's absolutely no help.

John Bode
  • 119,563
  • 19
  • 122
  • 198
6

There's nothing like hitting the source and seeing how someone has actually done it in a library in common use; let's look at one C library implementation in particular. I chose uLibC.

Here's the sin function:

http://git.uclibc.org/uClibc/tree/libm/s_sin.c

which looks like it handles a few special cases, and then carries out some argument reduction to map the input to the range [-pi/4,pi/4], (splitting the argument into two parts, a big part and a tail) before calling

http://git.uclibc.org/uClibc/tree/libm/k_sin.c

which then operates on those two parts. If there is no tail, an approximate answer is generated using a polynomial of degree 13. If there is a tail, you get a small corrective addition based on the principle that sin(x+y) = sin(x) + sin'(x')y

Moschops
  • 207
  • 2
  • 6
5

They are typically implemented in software and will not use the corresponding hardware (that is, aseembly) calls in most cases. However, as Jason pointed out, these are implementation specific.

Note that these software routines are not part of the compiler sources, but will rather be found in the correspoding library such as the clib, or glibc for the GNU compiler. See http://www.gnu.org/software/libc/manual/html_mono/libc.html#Trig-Functions

If you want greater control, you should carefully evaluate what you need exactly. Some of the typical methods are interpolation of look-up tables, the assembly call (which is often slow), or other approximation schemes such as Newton-Raphson for square roots.

mnemosyn
  • 45,391
  • 6
  • 76
  • 82
5

If you want an implementation in software, not hardware, the place to look for a definitive answer to this question is Chapter 5 of Numerical Recipes. My copy is in a box, so I can't give details, but the short version (if I remember this right) is that you take tan(theta/2) as your primitive operation and compute the others from there. The computation is done with a series approximation, but it's something that converges much more quickly than a Taylor series.

Sorry I can't rembember more without getting my hand on the book.

Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533
4

Whenever such a function is evaluated, then at some level there is most likely either:

  • A table of values which is interpolated (for fast, inaccurate applications - e.g. computer graphics)
  • The evaluation of a series that converges to the desired value --- probably not a taylor series, more likely something based on a fancy quadrature like Clenshaw-Curtis.

If there is no hardware support then the compiler probably uses the latter method, emitting only assembler code (with no debug symbols), rather than using a c library --- making it tricky for you to track the actual code down in your debugger.

James
  • 24,676
  • 13
  • 84
  • 130
4

I'll try to answer for the case of sin() in a C program, compiled with GCC's C compiler on a current x86 processor (let's say a Intel Core 2 Duo).

In the C language the Standard C Library includes common math functions, not included in the language itself (e.g. pow, sin and cos for power, sine, and cosine respectively). The headers of which are included in math.h.

Now on a GNU/Linux system, these libraries functions are provided by glibc (GNU libc or GNU C Library). But the GCC compiler wants you to link to the math library (libm.so) using the -lm compiler flag to enable usage of these math functions. I'm not sure why it isn't part of the standard C library. These would be a software version of the floating point functions, or "soft-float".

Aside: The reason for having the math functions separate is historic, and was merely intended to reduce the size of executable programs in very old Unix systems, possibly before shared libraries were available, as far as I know.

Now the compiler may optimize the standard C library function sin() (provided by libm.so) to be replaced with an call to a native instruction to your CPU/FPU's built-in sin() function, which exists as an FPU instruction (FSIN for x86/x87) on newer processors like the Core 2 series (this is correct pretty much as far back as the i486DX). This would depend on optimization flags passed to the gcc compiler. If the compiler was told to write code that would execute on any i386 or newer processor, it would not make such an optimization. The -mcpu=486 flag would inform the compiler that it was safe to make such an optimization.

Now if the program executed the software version of the sin() function, it would do so based on a CORDIC (COordinate Rotation DIgital Computer) or BKM algorithm, or more likely a table or power-series calculation which is commonly used now to calculate such transcendental functions. [Src: http://en.wikipedia.org/wiki/Cordic#Application]

Any recent (since 2.9x approx.) version of gcc also offers a built-in version of sin, __builtin_sin() that it will used to replace the standard call to the C library version, as an optimization.

I'm sure that is as clear as mud, but hopefully gives you more information than you were expecting, and lots of jumping off points to learn more yourself.

Community
  • 1
  • 1
mctylr
  • 5,159
  • 20
  • 32
4

If you want to look at the actual GNU implementation of those functions in C, check out the latest trunk of glibc. See the GNU C Library.

Chris Tonkinson
  • 13,823
  • 14
  • 58
  • 90
4

As many people pointed out, it is implementation dependent. But as far as I understand your question, you were interested in a real software implemetnation of math functions, but just didn't manage to find one. If this is the case then here you are:

  • Download glibc source code from http://ftp.gnu.org/gnu/glibc/
  • Look at file dosincos.c located in unpacked glibc root\sysdeps\ieee754\dbl-64 folder
  • Similarly you can find implementations of the rest of the math library, just look for the file with appropriate name

You may also have a look at the files with the .tbl extension, their contents is nothing more than huge tables of precomputed values of different functions in a binary form. That is why the implementation is so fast: instead of computing all the coefficients of whatever series they use they just do a quick lookup, which is much faster. BTW, they do use Tailor series to calculate sine and cosine.

I hope this helps.

Igor Korkhov
  • 8,283
  • 1
  • 26
  • 31
4

Don't use Taylor series. Chebyshev polynomials are both faster and more accurate, as pointed out by a couple of people above. Here is an implementation (originally from the ZX Spectrum ROM): https://albertveli.wordpress.com/2015/01/10/zx-sine/

Albert Veli
  • 1,589
  • 1
  • 8
  • 9
  • 2
    This doesn't really seem to answer the question as asked. The OP is asking how trig functions *are* calculated by common C compilers / libraries (and I'm pretty sure ZX Spectrum does not qualify), not how they *should* be calculated. This might've been a useful *comment* on some of the earlier answers, though. – Ilmari Karonen Jul 18 '15 at 11:03
  • 1
    Ah, you're right. It should have been a comment and not an answer. I haven't used SO in a while and forgot how the system works. Anyway, I think the Spectrum implementation is relevant because it had a really slow CPU and speed was of the essence. The best algorithm then is surely still pretty good so it would be a good idea for C libraries to implement trig functions using Chebyshev polynomials. – Albert Veli Jul 18 '15 at 11:14
1

Computing sine/cosine/tangent is actually very easy to do through code using the Taylor series. Writing one yourself takes like 5 seconds.

The whole process can be summed up with this equation here:

sin and cost expansion

Here are some routines I wrote for C:

double _pow(double a, double b) {
    double c = 1;
    for (int i=0; i<b; i++)
        c *= a;
    return c;
}

double _fact(double x) {
    double ret = 1;
    for (int i=1; i<=x; i++) 
        ret *= i;
    return ret;
}

double _sin(double x) {
    double y = x;
    double s = -1;
    for (int i=3; i<=100; i+=2) {
        y+=s*(_pow(x,i)/_fact(i));
        s *= -1;
    }  
    return y;
}
double _cos(double x) {
    double y = 1;
    double s = -1;
    for (int i=2; i<=100; i+=2) {
        y+=s*(_pow(x,i)/_fact(i));
        s *= -1;
    }  
    return y;
}
double _tan(double x) {
     return (_sin(x)/_cos(x));  
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
user1432532
  • 349
  • 1
  • 5
  • 13
  • 4
    This is a rather bad implementation since it does not use that the successive terms of the sine and cosine series have very simple quotients. Which means that one can reduce the number of multiplications and divisions from O(n^2) here to O(n). Further reductions are achieved by halving and squaring as for instance it is done in the bc (POSIX multiprecision calculator) math library. – Lutz Lehmann Apr 02 '14 at 23:08
  • 2
    It also doesn't seem to be answering the question as asked; the OP is asking how trig functions are calculated by common C compilers / libraries, not for custom reimplementations. – Ilmari Karonen Jul 18 '15 at 11:05
  • 2
    I think it is a good answer as it answers the spirit of the question which (and I can only guess of course) curiosity about an otherwise "black box" function like sin(). It is the only answer here that gives one a chance to quickly understand what is happening by glossing over it in a few seconds rather than reading some optimised C source code. – Mike M Apr 05 '16 at 07:22
  • in fact libraries use the much more optimized version, by realizing that once you have a term, you can get the next term by multiplying some values. See an example in [Blindy's answer](https://stackoverflow.com/a/2284929/995714). You're calculating the power and factorials again and again which is far slower – phuclv Apr 26 '19 at 07:54
  • for example [Intel's ICC math library is able to do math functions in math.h much faster than Intel's hardware instructions themselves](https://stackoverflow.com/q/2683588/995714#comment2704332_2683691) – phuclv Apr 27 '19 at 09:50
1

Improved version of code from Blindy's answer

#define EPSILON .0000000000001
// this is smallest effective threshold, at least on my OS (WSL ubuntu 18)
// possibly because factorial part turns 0 at some point
// and it happens faster then series element turns 0;
// validation was made against sin() from <math.h>
double ft_sin(double x)
{
    int k = 2;
    double r = x;
    double acc = 1;
    double den = 1;
    double num = x;

//  precision drops rapidly when x is not close to 0
//  so move x to 0 as close as possible
    while (x > PI)
        x -= PI;
    while (x < -PI)
        x += PI;
    if (x > PI / 2)
        return (ft_sin(PI - x));
    if (x < -PI / 2)
        return (ft_sin(-PI - x));
//  not using fabs for performance reasons
    while (acc > EPSILON || acc < -EPSILON)
    {
        num *= -x * x;
        den *= k * (k + 1);
        acc = num / den;
        r += acc;
        k += 2;
    }
    return (r);
}
Rugnar
  • 2,894
  • 3
  • 25
  • 29
  • Couldn't it just use the remainder of division instead of looping? something like (for positive part): x = x / PI - floor(x / PI) – Sean Ed-Man Jun 03 '21 at 10:43
0

The essence of how it does this lies in this excerpt from Applied Numerical Analysis by Gerald Wheatley:

When your software program asks the computer to get a value of enter image description here or enter image description here, have you wondered how it can get the values if the most powerful functions it can compute are polynomials? It doesnt look these up in tables and interpolate! Rather, the computer approximates every function other than polynomials from some polynomial that is tailored to give the values very accurately.

A few points to mention on the above is that some algorithms do infact interpolate from a table, albeit only for the first few iterations. Also note how it mentions that computers utilise approximating polynomials without specifying which type of approximating polynomial. As others in the thread have pointed out, Chebyshev polynomials are more efficient than Taylor polynomials in this case.

Dean P
  • 1,841
  • 23
  • 23
-1

if you want sin then

 __asm__ __volatile__("fsin" : "=t"(vsin) : "0"(xrads));

if you want cos then

 __asm__ __volatile__("fcos" : "=t"(vcos) : "0"(xrads));

if you want sqrt then

 __asm__ __volatile__("fsqrt" : "=t"(vsqrt) : "0"(value));

so why use inaccurate code when the machine instructions will do?

phuclv
  • 37,963
  • 15
  • 156
  • 475
user80998
  • 48
  • 2
  • 5
    Perhaps because [the machine instructions are also notoriously inaccurate.](https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/) – Ilmari Karonen Jul 18 '15 at 10:58