Set a breakpoint on __scalbn
. Run your program. Look at a backtrace (in GDB, bt
). The call tree will show that exp()
is a parent function for __scalbn
.
If a function has multiple callers, the first hit might not be from the "hot" function you're profiling.
To actually figure out which higher-up function (including its children) is responsible for using a lot of time, see linux perf: how to interpret and find hotspots. Top-down profiling can find expensive functions that do all their work in calls to other functions, even when those other functions also have "innocent" callers. (e.g. memcpy
is heavily used and often unavoidable, but what you'd want to find are callers that use it too much and could be optimized better. Or not called at all.)
And BTW, yes glibc's math lib exp()
implementation does internally use __scalbn
. I'm not sure how bad the implementation is, but I don't see an asm version for x86-64, only this pure C version. https://code.woboq.org/userspace/glibc/sysdeps/ieee754/dbl-64/wordsize-64/s_scalbn.c.html. (For __scalbnl(long double)
there's https://code.woboq.org/userspace/glibc/sysdeps/x86_64/fpu/s_scalbnl.S.html, using the x87 fscale
instruction for 80-bit floats. But there are only i386 asm files for the other sizes. And IA-64 (Itanium), but not x86-64).
glibc does have some vectorized EXP code, though, like the SSE4 SVML version https://code.woboq.org/userspace/glibc/sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core_sse4.S.html#_ZGVbN2v_exp_sse4.
If you want higher-performance exp()
without perfect accuracy, see Fastest Implementation of Exponential Function Using AVX (that's for float
, not double
. I forget if there's an SO answer with a double version).
Also related: Efficient implementation of log2(__m256d) in AVX2.