7

In a C++ CPU-bound simulation that I'm writing, I've traced a bottleneck via valgrind in my program to cmath::exp. It currently eats over 40% of my simulation time. I can bound the input to a relatively small domain, but I'd like to control the accuracy. I was considering moving to a LUT (lookup table) to replace exp but I'm not quite sure how to do this the "right way"(tm). Concerns I have:

  1. Large lookup tables will not fit in the cache thus slowing access
  2. Best way to convert a double input to an integer for access into the lookup table
  3. Does the answer to (2) depend on the slope of the input function?
  4. Am I reinventing the wheel - has this already been done before?

What is the best way to implement/(include from a library) a LUT for exp?

Hooked
  • 84,485
  • 43
  • 192
  • 261
  • What is the domain you need to handle? – Alan Stokes Jul 25 '12 at 20:50
  • @AlanStokes `exp(x)` where `x = [-k ... -1]`, `k>1` and `k` is only known at runtime (thus presumably the LUT would need to be rebuilt each time the program is run (this is OK as simulation times can take days)). – Hooked Jul 25 '12 at 20:56
  • A small portion of your computation code might be useful for us. Is it a kind of "main loop" or something really complex and hard-to-fit in a single SO post ? – Viktor Latypov Jul 25 '12 at 21:12
  • the [answer here](http://codereview.stackexchange.com/a/111531/84610) demonstrates how to achieve a look-up table for `exp` in the context of an overall computing goal similar to yours. – abcd Dec 07 '15 at 21:05
  • @dbliss: The function that implements is a severely clipped and quantized version of `exp`. It doesn't make any attempt to estimate any points that aren't precomputed. Which was fine for you. But don't confuse that with fast methods for estimating the entire function across its whole range. – Ben Voigt Dec 07 '15 at 21:47
  • @BenVoigt totally. the latter issue (getting a fast method for the whole range) is what i'm working on now. there's a big mess of text about this all over SO (some of it relevant to my question, most not), but i have yet to find a simple solution that works. – abcd Dec 08 '15 at 02:35

3 Answers3

1
  1. The optimal lookup table size is determined by the trade-off you make between performance, accuracy, and implementation complexity. You will have to profile, we cannot tell you the answer (we don't know the answer).

  2. Use lrint from <math.h> to convert double to long int. I'm not sure if it's in <cmath>.

  3. I am not sure what slope has to do with converting floating point numbers to integers. Could you elaborate on what you are worried about?

  4. Yes, you are reinventing the wheel. What you are doing has been done over and over again, by anybody who has ever implemented a math library. There is a lot of literature on the subject.

A straight look-up table is far from optimal. You will want to use some kind of polynomial approximation, perhaps a piecewise one with coefficients selected from a look-up table. For a function as smooth and predictable as exp, a polynomial will give you much higher accuracy for the same amount of computational effort. The required polynomials will depend on the tradeoff between complexity and accuracy, as well as whether you want to minimize expected error, minimize maximum error, or use some other loss function.

Restricting the domain of exp does not actually help that much, since it's so easy to expand to the entire domain.

// only works for x in [0, 1]
double exp2_limited(double x);

// works for all x, but doesn't handle overflow properly
double exp2(double x)
{
    return scalbn(exp2_limited(fmod(x, 1.0)), (long) floor(x));
}

Summary:

  • You have to know the required accuracy before you can design such a function.

  • You also have to know the loss function (i.e., choose the loss function).

  • You have to profile before you know how fast it is.

  • Use polynomials.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
1

I've had this problem and I took a few stack samples to diagnose it. What that does is tell where the calls are coming from and what the argument value is. I found that when exp was called from particular places, the argument value was highly repeatable.

That suggested a memoization approach, which made a huge difference.

It needed a simple "wrapper" function:

double exp_cached(double arg, double* old_arg, double* old_result){
  if (arg== *old_arg) return *old_result;
  *old_arg = arg;
  *old_result = exp(arg);
  return *old_result;
}

and wherever exp(foo) used to be called, do:

static double old_arg = -999999999, old_result;
...
... exp_cached(foo, &old_arg, &old_result)...

That way, exp doesn't get called if its argument at that place where it is called from has the same argument value as before.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

A very similar question has been asked before. Here's my answer:

The approach was suggested by the author of that question, and I was able to implement it efficiently (small lookup table and minimal extra work after lookup). It's in C#, but translation to C++ should be straightforward and I can help if you run into trouble.

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • what the question asks for is a way to improve performance without losing too much accuracy. your answer claims to do the inverse: improve accuracy without losing too much performance. please expand your answer so that it tells us why the link you've posted is still relevant for the OP. – abcd Dec 07 '15 at 21:07
  • @dbliss: The two are not different. If you draw a line between (1,0) and (0,1), then the point (0.8, 0.8) is simultaneously to the right of the line and above the line. The difference between the phrasing of the questions is that the other question has already moved from (high accuracy low performance) to (low accuracy high performance), while this one starts from (high accuracy low performance). But both are trying to get to (good accuracy good performance), while the conventional tradeoff only offers (mediocre accuracy good performance) or (good accuracy mediocre performance). – Ben Voigt Dec 07 '15 at 21:36
  • @dbliss: The irony is that the other question exactly is doing it in the context of neural networks, just like your Code Review post. So why are you trying to say the questions are totally unrelated? – Ben Voigt Dec 07 '15 at 21:40
  • @dbliss: So what you're saying is that you'd be more likely to visit the link if it was cut off as "Fast Exp calculation"? Right now it is the entire question title, pulled in automatically by the SO engine because it saw an on-site link. – Ben Voigt Dec 08 '15 at 05:19
  • for someone reason i'm just now noticing your first comment. i think that clarifies everything. the explanation in that comment is exactly what i was asking for with my first comment. – abcd Dec 08 '15 at 05:31