15

I'm writing a C program for a PIC micro-controller which needs to do a very specific exponential function. I need to calculate the following:

A = k . (1 - (p/p0)^0.19029)

k and p0 are constant, so it's all pretty simple apart from finding x^0.19029

(p/p0) ratio would always be in the range 0-1.

It works well if I add in math.h and use the power function, except that uses up all of the available 16 kB of program memory. Talk about bloatware! (Rest of program without power function = ~20% flash memory usage; add math.h and power function, =100%).

I'd like the program to do some other things as well. I was wondering if I can write a special case implementation for x^0.19029, maybe involving iteration and some kind of lookup table.

My idea is to generate a look-up table for the function x^0.19029, with perhaps 10-100 values of x in the range 0-1. The code would find a close match, then (somehow) iteratively refine it by re-scaling the lookup table values. However, this is where I get lost because my tiny brain can't visualise the maths involved.

Could this approach work?

Alternatively, I've looked at using Exp(x) and Ln(x), which can be implemented with a Taylor expansion. b^x can the be found with:

b^x = (e^(ln b))^x = e^(x.ln(b))

(See: Wikipedia - Powers via Logarithms)

This looks a bit tricky and complicated to me, though. Am I likely to get the implementation smaller then the compiler's math library, and can I simplify it for my special case (i.e. base = 0-1, exponent always 0.19029)?

Note that RAM usage is OK at the moment, but I've run low on Flash (used for code storage). Speed is not critical. Somebody has already suggested that I use a bigger micro with more flash memory, but that sounds like profligate wastefulness!

[EDIT] I was being lazy when I said "(p/p0) ratio would always be in the range 0-1". Actually it will never reach 0, and I did some calculations last night and decided that in fact a range of 0.3 - 1 would be quite adequate! This mean that some of the simpler solutions below should be suitable. Also, the "k" in the above is 44330, and I'd like the error in the final result to be less than 0.1. I guess that means an error in the (p/p0)^0.19029 needs to be less than 1/443300 or 2.256e-6

John Saunders
  • 160,644
  • 26
  • 247
  • 397
Jeremy
  • 1,083
  • 3
  • 13
  • 25
  • A lookup table sounds like the best option -- even 100 values cost only 400 (float)/800 (double) bytes. Create it once with a simple for-loop and include as a static array. For finding in-between values a simple linear approximation may be enough. – Jongware May 05 '14 at 22:09
  • 2
    To clarify what Oli said, an "exponential" function has the variable in the exponent. – ooga May 05 '14 at 22:13
  • Is there a limit to the number of decimals? Full float/double range, or can it be stored inside a scaled integer, for example? (Ed. A quick test shows storing as 1/65536th -- integer -- it deviates from the original by 0.001%.) – Jongware May 05 '14 at 22:16
  • OTOH (continuing experimenting), with only 100 values stored as unsigned shorts, I get 3 or more correct digits for values above 0.01 but everything below that deviates way, way too much to be useful. (Again, ed.) Also when I store them as doubles, so a linear lookup is **out**. – Jongware May 05 '14 at 22:35
  • (Continuing on my lookup table idea; feel free to ask me to shut up about it) Is it a good idea to create a *logarithmic* lookup table, so the distances between cached values *can* be linearly approximated? Or does that only add yet more math functions to implement? – Jongware May 05 '14 at 22:52
  • @Jongware, if we were to have a logarithm and its inverse, this problem would be trivial: x^a = exp(a ln x). – Raman Shah May 06 '14 at 01:26
  • 2
    Could you specify a range for the maximum relative and absolute errors that are acceptable? (maximum over the full range of p/p0) – Stefan May 06 '14 at 01:30
  • In reply to Jongware's first comment above: I did some modelling last night, and determined that a lookup table with 100 values (and a linear interpolation between them) would give suitable accuracy for my application (see edit to post re: range for p/p0). That's the simplest solution, but I'm going to look into some of the excellent answers below as well! – Jeremy May 06 '14 at 23:44

5 Answers5

16

Use splines. The relevant part of the function is shown in the figure below. It varies approximately like the 5th root, so the problematic zone is close to p / p0 = 0. There is mathematical theory how to optimally place the knots of splines to minimize the error (see Carl de Boor: A Practical Guide to Splines). Usually one constructs the spline in B form ahead of time (using toolboxes such as Matlab's spline toolbox - also written by C. de Boor), then converts to Piecewise Polynomial representation for fast evaluation.

In C. de Boor, PGS, the function g(x) = sqrt(x + 1) is actually taken as an example (Chapter 12, Example II). This is exactly what you need here. The book comes back to this case a few times, since it is admittedly a hard problem for any interpolation scheme due to the infinite derivatives at x = -1. All software from PGS is available for free as PPPACK in netlib, and most of it is also part of SLATEC (also from netlib).

pow

Edit (Removed)

(Multiplying by x once does not significantly help, since it only regularizes the first derivative, while all other derivatives at x = 0 are still infinite.)

Edit 2

My feeling is that optimally constructed splines (following de Boor) will be best (and fastest) for relatively low accuracy requirements. If the accuracy requirements are high (say 1e-8), one may be forced to get back to the algorithms that mathematicians have been researching for centuries. At this point, it may be best to simply download the sources of glibc and copy (provided GPL is acceptable) whatever is in

glibc-2.19/sysdeps/ieee754/dbl-64/e_pow.c

Since we don't have to include the whole math.h, there shouldn't be a problem with memory, but we will only marginally profit from having a fixed exponent.

Edit 3

Here is an adapted version of e_pow.c from netlib, as found by @Joni. This seems to be the grandfather of glibc's more modern implementation mentioned above. The old version has two advantages: (1) It is public domain, and (2) it uses a limited number of constants, which is beneficial if memory is a tight resource (glibc's version defines over 10000 lines of constants!). The following is completely standalone code, which calculates x^0.19029 for 0 <= x <= 1 to double precision (I tested it against Python's power function and found that at most 2 bits differed):

#define __LITTLE_ENDIAN

#ifdef __LITTLE_ENDIAN
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
#else
#define __HI(x) *(int*)&x
#define __LO(x) *(1+(int*)&x)
#endif

static const double
bp[] = {1.0, 1.5,},
dp_h[] = { 0.0, 5.84962487220764160156e-01,}, /* 0x3FE2B803, 0x40000000 */
dp_l[] = { 0.0, 1.35003920212974897128e-08,}, /* 0x3E4CFDEB, 0x43CFD006 */
zero = 0.0,
one = 1.0,
two =  2.0,
two53 = 9007199254740992.0, /* 0x43400000, 0x00000000 */
    /* poly coefs for (3/2)*(log(x)-2s-2/3*s**3 */
L1  =  5.99999999999994648725e-01, /* 0x3FE33333, 0x33333303 */
L2  =  4.28571428578550184252e-01, /* 0x3FDB6DB6, 0xDB6FABFF */
L3  =  3.33333329818377432918e-01, /* 0x3FD55555, 0x518F264D */
L4  =  2.72728123808534006489e-01, /* 0x3FD17460, 0xA91D4101 */
L5  =  2.30660745775561754067e-01, /* 0x3FCD864A, 0x93C9DB65 */
L6  =  2.06975017800338417784e-01, /* 0x3FCA7E28, 0x4A454EEF */
P1   =  1.66666666666666019037e-01, /* 0x3FC55555, 0x5555553E */
P2   = -2.77777777770155933842e-03, /* 0xBF66C16C, 0x16BEBD93 */
P3   =  6.61375632143793436117e-05, /* 0x3F11566A, 0xAF25DE2C */
P4   = -1.65339022054652515390e-06, /* 0xBEBBBD41, 0xC5D26BF1 */
P5   =  4.13813679705723846039e-08, /* 0x3E663769, 0x72BEA4D0 */
lg2  =  6.93147180559945286227e-01, /* 0x3FE62E42, 0xFEFA39EF */
lg2_h  =  6.93147182464599609375e-01, /* 0x3FE62E43, 0x00000000 */
lg2_l  = -1.90465429995776804525e-09, /* 0xBE205C61, 0x0CA86C39 */
ovt =  8.0085662595372944372e-0017, /* -(1024-log2(ovfl+.5ulp)) */
cp    =  9.61796693925975554329e-01, /* 0x3FEEC709, 0xDC3A03FD =2/(3ln2) */
cp_h  =  9.61796700954437255859e-01, /* 0x3FEEC709, 0xE0000000 =(float)cp */
cp_l  = -7.02846165095275826516e-09, /* 0xBE3E2FE0, 0x145B01F5 =tail of cp_h*/
ivln2    =  1.44269504088896338700e+00, /* 0x3FF71547, 0x652B82FE =1/ln2 */
ivln2_h  =  1.44269502162933349609e+00, /* 0x3FF71547, 0x60000000 =24b 1/ln2*/
ivln2_l  =  1.92596299112661746887e-08; /* 0x3E54AE0B, 0xF85DDF44 =1/ln2 tail*/

double pow0p19029(double x)
{
    double y = 0.19029e+00;
    double z,ax,z_h,z_l,p_h,p_l;
    double y1,t1,t2,r,s,t,u,v,w;
    int i,j,k,n;
    int hx,hy,ix,iy;
    unsigned lx,ly;

    hx = __HI(x); lx = __LO(x);
    hy = __HI(y); ly = __LO(y);
    ix = hx&0x7fffffff;  iy = hy&0x7fffffff;

    ax = x;
    /* special value of x */
    if(lx==0) {
        if(ix==0x7ff00000||ix==0||ix==0x3ff00000){
            z = ax;           /*x is +-0,+-inf,+-1*/
            return z;
        }
    }

    s = one; /* s (sign of result -ve**odd) = -1 else = 1 */

    double ss,s2,s_h,s_l,t_h,t_l;
    n  = ((ix)>>20)-0x3ff;
    j  = ix&0x000fffff;
    /* determine interval */
    ix = j|0x3ff00000;      /* normalize ix */
    if(j<=0x3988E) k=0;     /* |x|<sqrt(3/2) */
    else if(j<0xBB67A) k=1; /* |x|<sqrt(3)   */
    else {k=0;n+=1;ix -= 0x00100000;}
    __HI(ax) = ix;

    /* compute ss = s_h+s_l = (x-1)/(x+1) or (x-1.5)/(x+1.5) */
    u = ax-bp[k];           /* bp[0]=1.0, bp[1]=1.5 */
    v = one/(ax+bp[k]);
    ss = u*v;
    s_h = ss;
    __LO(s_h) = 0;
    /* t_h=ax+bp[k] High */
    t_h = zero;
    __HI(t_h)=((ix>>1)|0x20000000)+0x00080000+(k<<18); 
    t_l = ax - (t_h-bp[k]);
    s_l = v*((u-s_h*t_h)-s_h*t_l);
    /* compute log(ax) */
    s2 = ss*ss;
    r = s2*s2*(L1+s2*(L2+s2*(L3+s2*(L4+s2*(L5+s2*L6)))));
    r += s_l*(s_h+ss);
    s2  = s_h*s_h;
    t_h = 3.0+s2+r;
    __LO(t_h) = 0;
    t_l = r-((t_h-3.0)-s2);
    /* u+v = ss*(1+...) */
    u = s_h*t_h;
    v = s_l*t_h+t_l*ss;
    /* 2/(3log2)*(ss+...) */
    p_h = u+v;
    __LO(p_h) = 0;
    p_l = v-(p_h-u);
    z_h = cp_h*p_h;         /* cp_h+cp_l = 2/(3*log2) */
    z_l = cp_l*p_h+p_l*cp+dp_l[k];
    /* log2(ax) = (ss+..)*2/(3*log2) = n + dp_h + z_h + z_l */
    t = (double)n;
    t1 = (((z_h+z_l)+dp_h[k])+t);
    __LO(t1) = 0;
    t2 = z_l-(((t1-t)-dp_h[k])-z_h);

    /* split up y into y1+y2 and compute (y1+y2)*(t1+t2) */
    y1  = y;
    __LO(y1) = 0;
    p_l = (y-y1)*t1+y*t2;
    p_h = y1*t1;
    z = p_l+p_h;
    j = __HI(z);
    i = __LO(z);
    /*
     * compute 2**(p_h+p_l)
     */
    i = j&0x7fffffff;
    k = (i>>20)-0x3ff;
    n = 0;
    if(i>0x3fe00000) {            /* if |z| > 0.5, set n = [z+0.5] */
        n = j+(0x00100000>>(k+1));
        k = ((n&0x7fffffff)>>20)-0x3ff;      /* new k for n */
        t = zero;
        __HI(t) = (n&~(0x000fffff>>k));
        n = ((n&0x000fffff)|0x00100000)>>(20-k);
        if(j<0) n = -n;
        p_h -= t;
    } 
    t = p_l+p_h;
    __LO(t) = 0;
    u = t*lg2_h;
    v = (p_l-(t-p_h))*lg2+t*lg2_l;
    z = u+v;
    w = v-(z-u);
    t  = z*z;
    t1  = z - t*(P1+t*(P2+t*(P3+t*(P4+t*P5))));
    r  = (z*t1)/(t1-two)-(w+z*w);
    z  = one-(r-z);
    __HI(z) += (n<<20);
    return s*z;
}

Clearly, 50+ years of research have gone into this, so it's probably very hard to do any better. (One has to appreciate that there are 0 loops, only 2 divisions, and only 6 if statements in the whole algorithm!) The reason for this is, again, the behavior at x = 0, where all derivatives diverge, which makes it extremely hard to keep the error under control: I once had a spline representation with 18 knots that was good up to x = 1e-4, with absolute and relative errors < 5e-4 everywhere, but going to x = 1e-5 ruined everything again.

So, unless the requirement to go arbitrarily close to zero is relaxed, I recommend using the adapted version of e_pow.c given above.

Edit 4

Now that we know that the domain 0.3 <= x <= 1 is sufficient, and that we have very low accuracy requirements, Edit 3 is clearly overkill. As @MvG has demonstrated, the function is so well behaved that a polynomial of degree 7 is sufficient to satisfy the accuracy requirements, which can be considered a single spline segment. @MvG's solution minimizes the integral error, which already looks very good.

The question arises as to how much better we can still do? It would be interesting to find the polynomial of a given degree that minimizes the maximum error in the interval of interest. The answer is the minimax polynomial, which can be found using Remez' algorithm, which is implemented in the Boost library. I like @MvG's idea to clamp the value at x = 1 to 1, which I will do as well. Here is minimax.cpp:

#include <ostream>
#define TARG_PREC 64
#define WORK_PREC (TARG_PREC*2)

#include <boost/multiprecision/cpp_dec_float.hpp>
typedef boost::multiprecision::number<boost::multiprecision::cpp_dec_float<WORK_PREC> > dtype;
using boost::math::pow;

#include <boost/math/tools/remez.hpp>
boost::shared_ptr<boost::math::tools::remez_minimax<dtype> > p_remez;

dtype f(const dtype& x) {
    static const dtype one(1), y(0.19029);
    return one - pow(one - x, y);
}

void out(const char *descr, const dtype& x, const char *sep="") {
    std::cout << descr << boost::math::tools::real_cast<double>(x) << sep << std::endl;
}

int main() {
    dtype a(0), b(0.7);   // range to optimise over
    bool rel_error(false), pin(true);
    int orderN(7), orderD(0), skew(0), brake(50);

    int prec = 2 + (TARG_PREC * 3010LL)/10000;
    std::cout << std::scientific << std::setprecision(prec);

    p_remez.reset(new boost::math::tools::remez_minimax<dtype>(
        &f, orderN, orderD, a, b, pin, rel_error, skew, WORK_PREC));
    out("Max error in interpolated form: ", p_remez->max_error()); 

    p_remez->set_brake(brake);

    unsigned i, count(50);
    for (i = 0; i < count; ++i) {
        std::cout << "Stepping..." << std::endl;
        dtype r = p_remez->iterate();
        out("Maximum Deviation Found:                     ", p_remez->max_error());
        out("Expected Error Term:                         ", p_remez->error_term());
        out("Maximum Relative Change in Control Points:   ", r);
    }

    boost::math::tools::polynomial<dtype> n = p_remez->numerator();
    for(i = n.size(); i--; ) {
        out("", n[i], ",");
    }
}

Since all parts of boost that we use are header-only, simply build with:

c++ -O3 -I<path/to/boost/headers> minimax.cpp -o minimax

We finally get the coefficients, which are after multiplication by 44330:

24538.3409, -42811.1497, 34300.7501, -11284.1276, 4564.5847, 3186.7541, 8442.5236, 0.

The following error plot demonstrates that this is really the best possible degree-7 polynomial approximation, since all extrema are of equal magnitude (0.06659):

abserr

Should the requirements ever change (while still keeping well away from 0!), the C++ program above can be simply adapted to spit out the new optimal polynomial approximation.

Stefan
  • 4,380
  • 2
  • 30
  • 33
  • Absolutely. Regularizing singularities in this way is one of the best ways to go. However, splines themselves are tricky: I'd instead suggest using Newton's method on x^1.19029 and then dividing by x at the end. – Raman Shah May 06 '14 at 01:48
  • 2
    If one were to implement a square root function, then one could compute the result as the eight root of x^(8*0.19029)=x^1.52232. And use some look-up approach as above for the power. This should reproduce the step slope at zero. – Lutz Lehmann May 06 '14 at 06:29
  • That is an awesome answer, LutzL; I stole your idea below. :) – Raman Shah May 06 '14 at 16:01
  • Haha, I'd say your edit 3 clinches it. That is some gnarly code though. – Raman Shah May 06 '14 at 17:54
  • Thanks for your very detailed answer! I'll how the code you found and posted in 'Edit 3' works on my micro. I think I can also do it with a 100 point look up table and a linear interpolation. – Jeremy May 07 '14 at 02:07
8

Instead of a lookup table, I'd use a polynomial approximation:

1 - x0.19029 ≈ - 1073365.91783x15 + 8354695.40833x14 - 29422576.6529x13 + 61993794.537x12 - 87079891.4988x11 + 86005723.842x10 - 61389954.7459x9 + 32053170.1149x8 - 12253383.4372x7 + 3399819.97536x6 - 672003.142815x5 + 91817.6782072x4 - 8299.75873768x3 + 469.530204564x2 - 16.6572179869x + 0.722044145701

Or in code:

double f(double x) {
  double fx;
  fx =      - 1073365.91783;
  fx = fx*x + 8354695.40833;
  fx = fx*x - 29422576.6529;
  fx = fx*x + 61993794.537;
  fx = fx*x - 87079891.4988;
  fx = fx*x + 86005723.842;
  fx = fx*x - 61389954.7459;
  fx = fx*x + 32053170.1149;
  fx = fx*x - 12253383.4372;
  fx = fx*x + 3399819.97536;
  fx = fx*x - 672003.142815;
  fx = fx*x + 91817.6782072;
  fx = fx*x - 8299.75873768;
  fx = fx*x + 469.530204564;
  fx = fx*x - 16.6572179869;
  fx = fx*x + 0.722044145701;
  return fx;
}

I computed this in sage using the least squares approach:

f(x) = 1-x^(19029/100000) # your function
d = 16                    # number of terms, i.e. degree + 1
A = matrix(d, d, lambda r, c: integrate(x^r*x^c, (x, 0, 1)))
b = vector([integrate(x^r*f(x), (x, 0, 1)) for r in range(d)])
A.solve_right(b).change_ring(RDF)

Here is a plot of the error this will entail:

Error plot

Blue is the error from my 16 term polynomial, while red is the error you'd get from piecewise linear interpolation with 16 equidistant values. As you can see, both errors are quite small for most parts of the range, but will become really huge close to x=0. I actually clipped the plot there. If you can somehow narrow the range of possible values, you could use that as the domain for the integration, and obtain an even better fit for the relevant range. At the cost of worse fit outside, of course. You could also increase the number of terms to obtain a closer fit, although that might also lead to higher oscillations.

I guess you can also combine this approach with the one Stefan posted: use his to split the domain into several parts, then use mine to find a close low degree polynomial for each part.

Update

Since you updated the specification of your question, with regard to both the domain and the error, here is a minimal solution to fit those requirements:

44330(1 - x0.19029) ≈ + 23024.9160933(1-x)7 - 39408.6473636(1-x)6 + 31379.9086193(1-x)5 - 10098.7031260(1-x)4 + 4339.44098317(1-x)3 + 3202.85705860(1-x)2 + 8442.42528906(1-x)

double f(double x) {
  double fx, x1 = 1. - x;
  fx =       + 23024.9160933;
  fx = fx*x1 - 39408.6473636;
  fx = fx*x1 + 31379.9086193;
  fx = fx*x1 - 10098.7031260;
  fx = fx*x1 + 4339.44098317;
  fx = fx*x1 + 3202.85705860;
  fx = fx*x1 + 8442.42528906;
  fx = fx*x1;
  return fx;
}

I integrated x from 0.293 to 1 or equivalently 1 - x from 0 to 0.707 to keep the worst oscillations outside the relevant domain. I also omitted the constant term, to ensure an exact result at x=1. The maximal error for the range [0.3, 1] now occurs at x=0.3260 and amounts to 0.0972 < 0.1. Here is an error plot, which of course has bigger absolute errors than the one above due to the scale factor k=44330 which has been included here.

Error plot for update

I can also state that the first three derivatives of the function will have constant sign over the range in question, so the function is monotonic, convex, and in general pretty well-behaved.

MvG
  • 57,380
  • 22
  • 148
  • 276
  • 1
    High-order interpolating polynomials tend to oscillate wildly between knots, so they are usually not suitable for interpolation problems. – Stefan May 05 '14 at 23:26
  • 1
    Yes, you minimize the integral error. However, one usually wants to obtain a bound for the *maximal* error in such applications. The problem is obviously at x ~ 0, where the error chart skyrockets. Maybe one can fix this by fixing the derivative at x = 0. In my experience, splines do better for this type of problem. – Stefan May 05 '14 at 23:36
  • @Jeremy: So do I. I'm pretty sure Stefan's approach can yield superior approximations if done correctly, but doing it correctly requires work, particularly if you don't have matlab at hand. This here should work in any CAS, and sage is free. By the way, I just updated my answer to cater for your updated requirements. – MvG May 07 '14 at 02:50
  • 1
    @MvG: Cool! Except... I think your computer uses a different type of abacus because I couldn't get your updated polynomial to work! For example, when I put in 0.927, I get 3824.6 where it should be 634.84. I tried the polynomial in Excel too, and it gave the same result. – Jeremy May 08 '14 at 08:20
  • @Jeremy: That error came from me toying with the first edit from Stefan's answer, where he effectively increased the exponent by 1. I still had that exponent in my code when I updated my answer, and didn't notice. I just updated my answer again, and (not surprisingly) found that the original function is a bit harder to approximate. Now it looks fine, though. – MvG May 08 '14 at 08:58
2

Not meant to answer the question, but it illustrates the Road Not To Go, and thus may be helpful:

This quick-and-dirty C code calculates pow(i, 0.19029) for 0.000 to 1.000 in steps of 0.01. The first half displays the error, in percents, when stored as 1/65536ths (as that theoretically provides slightly over 4 decimals of precision). The second half shows both interpolated and calculated values in steps of 0.001, and the difference between these two.

It kind of looks okay if you read from the bottom up, all 100s and 99.99s there, but about the first 20 values from 0.001 to 0.020 are worthless.

#include <stdio.h>
#include <math.h>

float powers[102];

int main (void)
{
    int i, as_int;
    double as_real, low, high, delta, approx, calcd, diff;

    printf ("calculating and storing:\n");

    for (i=0; i<=101; i++)
    {
        as_real = pow(i/100.0, 0.19029);
        as_int = (int)round(65536*as_real);
        powers[i] = as_real;

        diff = 100*as_real/(as_int/65536.0);
        printf ("%.5f %.5f %.5f ~ %.3f\n", i/100.0, as_real, as_int/65536.0, diff);
    }

    printf ("\n");
    printf ("-- interpolating in 1/10ths:\n");

    for (i=0; i<1000; i++)
    {
        as_real = i/1000.0;

        low = powers[i/10];
        high = powers[1+i/10];

        delta = (high-low)/10.0;
        approx = low + (i%10)*delta;

        calcd = pow(as_real, 0.19029);
        diff = 100.0*approx/calcd;

        printf ("%.5f ~ %.5f = %.5f +/- %.5f%%\n", as_real, approx, calcd, diff);
    }

    return 0;
}
Jongware
  • 22,200
  • 8
  • 54
  • 100
2

You can find a complete, correct standalone implementation of pow in fdlibm. It's about 200 lines of code, about half of which deal with special cases. If you remove the code that deals with special cases you're not interested in I doubt you'll have problems including it in your program.

Joni
  • 108,737
  • 14
  • 143
  • 193
  • Looks like that's where `glibc` took it from... Except that you can take it from netlib for free, without inheriting the GPL. – Stefan May 06 '14 at 09:02
  • Ouch, I hadn't seen you had already suggested using the glibc! – Joni May 06 '14 at 11:05
1

LutzL's answer is a really good one: Calculate your power as (x^1.52232)^(1/8), computing the inner power by spline interpolation or another method. The eighth root deals with the pathological non-differentiable behavior near zero. I took the liberty of mocking up an implementation this way. The below, however, only does a linear interpolation to do x^1.52232, and you'd need to get the full coefficients using your favorite numerical mathematics tools. You'll adding scarcely 40 lines of code to get your needed power, plus however many knots you choose to use for your spline, as dicated by your required accuracy.

Don't be scared by the #include <math.h>; it's just for benchmarking the code.

#include <stdio.h>
#include <math.h>

double my_sqrt(double x) {
  /* Newton's method for a square root. */
  int i = 0;
  double res = 1.0;

  if (x > 0) {
    for (i = 0; i < 10; i++) {
      res = 0.5 * (res + x / res);
    }
  } else {
    res = 0.0;
  }

  return res;
}

double my_152232(double x) {
  /* Cubic spline interpolation for x ** 1.52232. */
  int i = 0;
  double res = 0.0;
  /* coefs[i] will give the cubic polynomial coefficients between x =
     i and x = i+1. Out of laziness, the below numbers give only a
     linear interpolation.  You'll need to do some work and research
     to get the spline coefficients. */

  double coefs[3][4] = {{0.0, 1.0, 0.0, 0.0},
            {-0.872526, 1.872526, 0.0, 0.0},
            {-2.032706, 2.452616, 0.0, 0.0}};

  if ((x >= 0) && (x < 3.0)) {
    i = (int) x;
    /* Horner's method cubic. */
    res = (((coefs[i][3] * x + coefs[i][2]) * x) + coefs[i][1] * x) 
      + coefs[i][0];
  } else if (x >= 3.0) {
    /* Scaled x ** 1.5 once you go off the spline. */
    res = 1.024824 * my_sqrt(x * x * x);
  }

  return res;
}

double my_019029(double x) {
  return my_sqrt(my_sqrt(my_sqrt(my_152232(x))));
}

int main() {
  int i;
  double x = 0.0;

  for (i = 0; i < 1000; i++) {
    x = 1e-2 * i;
    printf("%f %f %f \n", x, my_019029(x), pow(x, 0.19029));
  }

  return 0;
}

EDIT: If you're just interested in a small region like [0,1], even simpler is to peel off one sqrt(x) and compute x^1.02232, which is quite well behaved, using a Taylor series:

double my_152232(double x) {
  double part_050000 = my_sqrt(x);
  double part_102232 = 1.02232 * x + 0.0114091 * x * x - 3.718147e-3 * x * x * x;

  return part_102232 * part_050000;
}

This gets you within 1% of the exact power for approximately [0.1,6], though getting the singularity exactly right is always a challenge. Even so, this three-term Taylor series gets you within 2.3% for x = 0.001.

Raman Shah
  • 467
  • 2
  • 12
  • 1
    Exactly, getting the singularity right is the problem (it's an essential singularity, with all derivatives diverging). I got as close as x = 1e-4 with errors < 5e-4 using splines with optimal knot placement, but then you need to go on to x = 1e-5, 1e-6, 1e-7, ... and must still control the error. – Stefan May 06 '14 at 17:11
  • I'm impressed that you got such good performance with splines alone --- they're so strictly suited for smooth functions. My guess for the OP's embedded application that there's a roundoff limit for the input, so either solution would be perfectly adequate. For what it's worth, once you up the handrolled sqrt to 20 iterations, my second method still gives an answer within 5% for 1e-7. I think handrolling the sqrt is critical for getting the cuspy behavior. – Raman Shah May 06 '14 at 17:18
  • (Or, of course, to handroll a logarithm, which is what your edit 3 code does.) – Raman Shah May 06 '14 at 18:06
  • Nice simplicity, but not accurate enough for this case (see edit to question. Thanks though! – Jeremy May 07 '14 at 02:05
  • Hi Jeremy, well if you want high accuracy in [0.3, 1], I do think that a spline will work beautifully and could even be written to run faster than anything with a sqrt (which involves a lot of divisions but gives a more or less correct singularity at 0). Still, the netlib implementation Stefan gave above is probably the canonical answer. – Raman Shah May 07 '14 at 14:16
  • Also: If you like this method, you should be able to meet your specifications by re-centering your Taylor series somewhere inside of [0.3, 1] and using more terms. Merely Taylor-expanding a cubic polynomial around x = 0.65 (which wasn't even optimal) gets you within 1e-3 of the right answer over [0.3, 1]. Getting within 1e-6 wouldn't be hard with a few more terms. Despite the simplicity of the interpolating polynomial, I discourage this approach unless you can make strong guarantees that you'll never land outside of your target domain. Go even modestly outside and you'll get garbage answers. – Raman Shah May 07 '14 at 14:51