2

Using random pausing to profile my multi-threaded application in C, I came to notice that exp() and drand48_r() pop up in the stack a lot.

Is there any other implementation of these functions? As for exp() I found the answer in SO here but nobody tested this in C and I am not sure if the conversion from C++ is that straightforward.

round() did also crop up, and I am currently using this:

int roundI(double x)
{
    if (x < 0.0)
        return (int)(x - 0.5);
    else
        return (int)(x + 0.5);
}

which I believe is efficient enough. Any comments are welcome, though.

a3mlord
  • 1,060
  • 6
  • 16
  • I'm confused about what you're really after here... Are you trying to optimize "random pausing" artificially introduced to your code to profile it? Or does your code "legitimately" call these functions and as a result of profiling, you see the execute a lot so you want them to be fast? Also, chances are very good that your libc/libm/compiler already has an extremely efficient implementation and you're not going to do any better without hand-coded assemble and/or taking shortcuts & making assumptions about expected input. – Brian McFarland Mar 12 '15 at 12:19
  • @BrianMcFarland Random Pausing is a technique to profile your code. See here http://scicomp.stackexchange.com/questions/1449/profiling-cfd-codes-with-callgrind/1870#1870 and here http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux – a3mlord Mar 12 '15 at 12:24
  • @BrianMcFarland So answering your questions, yes, my code does call those functions billions of times (depending on the input size). – a3mlord Mar 12 '15 at 12:25
  • Got it. I wasn't familiar with the term "random pausing" as an established technique. Makes sense now. Is CUDA or OpenCL an option? – Brian McFarland Mar 12 '15 at 14:07
  • Sadly, no. My application uses several TB of RAM, I am bounded to CPUs. – a3mlord Mar 12 '15 at 14:14

1 Answers1

1

I've run into the same thing with functions like exp, log, and others.

I don't expect to be able to speed up the functions much, but I do try to see if I can call them less.

For exp, is it possible to work in the log space?

For all of them, the method that really worked is that, since I could see they were often being called with the same argument from the same place (that's what you can see with random pausing), it made sense to memoize them. I just wrapped exp in another function
double exp_cached(double arg, double& old_arg, double& old_val).
If the arg is equal to old_arg, return old_val. If not, call exp and set old_arg and old_val.

There are lots of variations on this technique.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Excellent idea! I don't think in my case this will help me a lot but I will give it a shot. BTW, I am using random pausing on Linux by running the app on gdb and then stop it with Ctrl+C. This only tells me the function where it stopped, but not the arguments. Any other way of doing this? – a3mlord Mar 12 '15 at 13:06
  • Yes, it is possible to work in the log space for exp; – a3mlord Mar 12 '15 at 13:19
  • Very curious! Although the `old_arg` and the current `arg` are often the same, it actually slows down the code a bit to do this. Maybe the exp that is popping up in the stack is something executed elsewhere, like inside `drand48`? – a3mlord Mar 12 '15 at 14:41
  • For your first comment, when you stop it and get into the running thread, do `bt` to see the call stack with arguments. Or, you can do `fin` to run until the current function exits, and you can examine variables. For the memoization, yes it does add a bit of overhead, which if `arg` and `old_arg` are the same, means you can avoid the call to `exp`, saving a great deal. The comparison costs just a few cycles, while `exp` costs at least hundreds, maybe thousands, and it sounds like it, with your other functions, are accounting for the vast majority of your time. – Mike Dunlavey Mar 12 '15 at 15:41
  • This is the output of sample: (gdb) bt #0 0x00007ffff79d4d65 in exp.L () from /opt/intel/.../libimf.so #1 0x0000000000437b68 in cExp (arg=, old_arg=, old_val=, $02=, $03=, $04=) at s.c:20 #2 SZ (c=, s_square=, t_=, seed=, pvt_st=, $09=, $10=, $11=, $12=, $13=) at s.c:61 #3 SLF (pLF=0x7fffffffa720, n_=0, m_=1023, t_=, coef_=0xdfe00000, s_=0x7ffff7aebf50, mu_=0x1400000, B_=0x724edb000, seed=0, pvt_st=0x7fffffffa720) at s.c:105 – a3mlord Mar 12 '15 at 15:48
  • OK, I put that in Notepad and separated the lines. There's a lot of stuff it's not showing you. Is this a debug build or do you have optimization enabled? I always work with optimization disabled, do all my tuning, and when I like it, then turn on compiler optimization. (I know that goes against one of the tenets of common wisdom. Common wisdom is often not very well thought out. If there's a speedup in there, all -O3 does is make it really hard to find.) – Mike Dunlavey Mar 12 '15 at 16:20