How to use _mm256_log_ps by leveraging Intel OpenCL SVML?

Question

I found that _mm256_log_ps can't be used with GCC7. Most common suggestions on stackoverflow is to use ICC or leveraging OpenCL SDK.

After downloading SDK and extracting RPM file, there are three .so files: __ocl_svml_l9.so, __ocl_svml_e9.so, __ocl_svml_h8.so

Can someone teach me how to call _mm256_log_ps with these .so files?

Thank you.

If your computation can be done efficiently on a GPU, use OpenCL. If you just want an AVX2 `log` function, use an existing implementation that gives you the speed / accuracy tradeoff you want. e.g. [How many clock cycles does cost AVX/SSE exponentiation on modern x86\_64 CPU?](https://stackoverflow.com/q/31502095) has some libraries that presumably have vectorized full-accuracy versions, including glibc `libmvec`. But for faster lower precision, see [Efficient implementation of log2(\_\_m256d) in AVX2](https://stackoverflow.com/q/45770089) (my answer mentions float as well as double) — Peter Cordes, Aug 11 '18 at 05:08
Related: [Fastest Implementation of Exponential Function Using AVX](https://stackoverflow.com/q/48863719) has a fast approximate implementation of float `exp` for `__m256`. — Peter Cordes, Aug 11 '18 at 05:15
@PeterCordes thx for the info. Actually I've tried avx_mathfun.h but `log256_ps` return NaN when log(N <=0) where as numpy.log returns -INF on 0 and NaN on -N. That's why I want to give _mm256_log_ps a try. — user2131907, Aug 11 '18 at 05:28
I also tried to use `-lmvec -lm`. It will compile but aborted when running and show unable to find `_ZGVeN16v___expf_finite` — user2131907, Aug 11 '18 at 05:34
Ah yes, handling the corner cases (or not) is another area is another tradeoff. If your use-case never has negative inputs, you don't even need to check for them, making it faster. And then since you're doing bit-hacks, there's the question of whether you just look at the sign bit, and lump `-0.0` in with negative numbers, or whether you treat it like an IEEE comparison as exactly equal to `+0.0` and return `-Inf`. You seem to really want to call an actual library function instead of just picking a custom implementation that can inline and doesn't waste time doing anything you don't want. — Peter Cordes, Aug 11 '18 at 05:38
What are your precision requirements? Do you really need 1ulp or 0.5 ulp precision? Or is a much faster implementation that's accurate to maybe 15 bits in the significand sufficient? If the latter, then you can use a smaller polynomial without all the tricks that Agner Fog's VCL implementation uses to preserve precision when adding up parts of the result. That's basically separate from compare/blend to handle all the special cases. (And if NaN or +/-Inf is rare, you can branch on there being no special cases before running the code to do the blending, improving throughput and latency.) — Peter Cordes, Aug 11 '18 at 05:42
@PeterCordes . The precision I need is actually quiet low. The difference between custom implementation and true value can be accepted if the difference is less than 1e-6. But I still need to find a way to handle 0 and -N issue. (I'm not very good at mask operation) — user2131907, Aug 11 '18 at 15:11
Did you look for "SVML" in the gcc manual? You don't mention `-mveclibabi=svml` for instance. — Marc Glisse, Aug 20 '18 at 05:58

score 1 · Answer 1 · answered Aug 18 '18 at 22:01

You can use the log function from the Eigen library:

#include <Eigen/Core>

void foo(float* data, int size)
{
    Eigen::Map<Eigen::ArrayXf> arr(data, size);
    arr = arr.log();
}

Depending on the compile flags this generates optimized SSE or AVX code (as well as SIMD for other architectures). The implementation is based on http://gruntthepeon.free.fr/ssemath/ which is based on cephes.

How to use _mm256_log_ps by leveraging Intel OpenCL SVML?

1 Answers1