6

I'm trying to benchmark different ways to apply a function to an array.

why is https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3260,2124,4779,4779&cats=Trigonometry&text=_sin

_mm_sin_ps not known to my scope but _mm_sqrt_ps is?

how do I make it known? And compile it without errors.

#include <random>
#include <iostream>
#include <cmath>
#include <chrono>
#include <algorithm>
#include <valarray>
#include "immintrin.h"
#include <array>
int main()
{
    std::cout<<"start\n";
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_real_distribution<> dis(-1000, 1000);
    int N=100;
    while(N--)
    {   
        std::cout<<"\nN: "<<N;

    const int T1=4E6;
      { 
        int T=T1,T0=T1/4;
        std::array<float,T1> array;
        while(T--)
        {
            array[T]=dis(gen);
        }
        auto start_time = std::chrono::high_resolution_clock::now();
        auto it =array.begin();
        while(T0--)
        {
            __m128 X = _mm_loadu_ps(it);
            __m128 result = _mm_sin_ps(X);
            _mm_storeu_ps(it, result);
            it+=4;
        }
        auto time2=std::chrono::high_resolution_clock::now()-start_time;
            std::cout<<"\nintr1: "<<std::chrono::duration_cast<std::chrono::microseconds>(time2).count();
        }
  }
    std::cout<<"\nfin\n";
    return 0;
}

compiler

g++ -v

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu      4.8.2-19ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs   --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable- plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu  --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) 
Paul R
  • 208,748
  • 37
  • 389
  • 560
Philipp
  • 137
  • 1
  • 2
  • 11
  • If you declare it yourself `extern __m128 _mm_sin_ps(__m128 v1);`, does that compile? – Anthony Aug 13 '15 at 02:49
  • yes but then the collector upon build complains undef. ref to _mm_sin_ps(float __vector)' – Philipp Aug 13 '15 at 02:53
  • 1
    could it be that my machine simply does not know the _mm_sin_ps intrinsic? does the code compile on ur machine? – Philipp Aug 13 '15 at 02:56
  • 1
    OK, so you get a linker error, which means the intrinsic is not available to your compiler. – Anthony Aug 13 '15 at 03:17
  • What compiler/platform are you using? – DrPizza Aug 13 '15 at 03:20
  • compiler: gcc 4.8.2 i put g++ -v output inside the question – Philipp Aug 13 '15 at 03:27
  • 1
    I have gcc 4.9.2 and there isn't function `_mm_sin_ps`, furthermore on the Intel page, this function is in section **SVML** and has no machine instruction. Have a look at [this project](http://gruntthepeon.free.fr/ssemath/), use a library for SSE trigonometry or implement it by yourself. – Youka Aug 13 '15 at 03:48
  • @anthony-arnold: if the linker gets involved with intrinsics, you have bigger problems. The compiler turns each intrinsic into one CPU instruction. Except for Intel's "fake" intrinsics which are actually vector library functions, not machine instructions, like this one. – Peter Cordes Aug 13 '15 at 06:50
  • Soon: https://sourceware.org/glibc/wiki/libmvec – Marc Glisse Aug 15 '15 at 10:42
  • How to enable SVML in ICC (Under Visual Studio)? – Royi Feb 22 '18 at 13:53
  • I would also mention the SLEEF project - [SLEEF: SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT](https://sleef.org/). It has great performance which rivals Intel SVML and variable accuracy for the user to chose from. The only issue I found is it only supports MSVC on Windows (I'd like it to support CLang-CL as well). – David Jun 05 '19 at 18:37
  • Basically a duplicate of [Where is Clang's '\_mm256\_pow\_ps' intrinsic?](//stackoverflow.com/q/36636159), but this one has answers that go into more detail about implementing it yourself. – Peter Cordes Jun 06 '19 at 02:21

3 Answers3

7

_mm_sin_ps is part of the SVML library, shipped with intel compilers only. GCC developers focused on wrapping machine instructions and simple tasks, so there's no SVML in immintrin.h so far.

You have to use a library or write it by yourself. Sinus implementation:

Youka
  • 2,646
  • 21
  • 33
  • thanks,i dont think that cordic can be used with sse, because of the table lookup. i will use min{int_0_pi/2 ((P_n(x)-sin(x))^2)dx, with P_n(0)=0;P_ n(pi/2)=1; with n=2 this should be the Quadratic curve – Philipp Aug 13 '15 at 18:50
  • [DirectXMath](http://blogs.msdn.com/b/chuckw/archive/2012/03/27/introducing-directxmath.aspx) in the Windows 8 SDK or later also includes a SIMD implementation of transcendental functions. – Chuck Walbourn Aug 13 '15 at 20:43
4

As has already been pointed out, you're trying to use Intel's SVML library.

There are however several SIMD transcendental functions in the free open source sse_mathfun library. The original version, which uses only SSE2 is here: http://gruntthepeon.free.fr/ssemath/ but there's a more up-to-date version here which has been updated for SSE3/SSE4 here: https://github.com/RJVB/sse_mathfun

The function you want is called sin_ps:

v4sf sin_ps(v4sf x);

where v4sf is just a typedef for __m128.

The original sse_mathfun also has cos_ps, log_ps and exp_ps, and the newer (RJVB) version has some additional functions for both single and double precision.

I've successfully used both versions of this library with gcc, clang and Intel's ICC (with some minor mods for the latter).

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • 1
    It is possible to use Intel's SVML with another compiler, but this is undocumented. Function prototypes look like this: extern __m128 __vectorcall __svml_expf4 (__m128); extern __m128d __vectorcall __svml_exp2 (__m128d); Unofficial function prototypes listed in https://github.com/vectorclass/version2/blob/master/vectormath_lib.h – A Fog Aug 30 '22 at 09:37
4

__mm_sin_ps is an intrinsic for calling SVML library(already mentioned).

In GCC SVML is available as a part of libmvec in glibc.

Functions are named according to Vector ABI, described in the link above. Sin, cos, exp, sincos, log, pow functions are available. Here is an example for __m128:

#include <x86intrin.h>
#include <stdio.h>

typedef union
{
  __m128  x;
  float a[4];
} union128;

__m128 _ZGVbN4v_sinf_sse4(__m128);

void main()
{
  union128 s1, res;
  s1.x = _mm_set_ps (0, 0.523599, 1.0472 , 1.5708);
  res.x =_ZGVbN4v_sinf_sse4(s1.x);
  fprintf(stderr, "%f %f %f %f\n", res.a[0], res.a[1], res.a[2], res.a[3]);
}

Is there any reason, why intrinsic is better than using the SVML function directly?

vaalfreja
  • 41
  • 3