0

I'm learning SSE coding and now stuck on a couple lines of code while porting / vectorizing an exp(double) function.

I'm kind of beginner with C/C++ coding as well so these two lines confuses and I don't know what's happening there:

fast_exp(double x) {
    double  a = c1 * x + 0.5;
    int     n = (int)a;  // truncate toward zero
    ieee754 u;  // union of double and unsigned short[4]

    n -= (a < 0); //(1)
     ...

    u.s[3] = (unsigned short)((n << 4) & 0x7FF0); //(2)
    return  u.d;
}

What is done there?

(1) Is the a subtracted from n if a is holding a negative value ?

(2) ? bitwise left shift but, what more happens there?

And, how those two code lines are written using SSE intrinsics?


Here's what I have so far, partially ported to SSE intrinsics. The lines I'm stuck on are marked with ?? in code below. It doesn't compile (yet).

typedef union {
  double d;
  unsigned short s[4];
} ieee754;

double exp_sse (double value){

    double px; //, a;
    ieee754 u;
    __m128i n;
    __m128d a;
    __m128d  x = _mm_set1_pd (x);
    __m128d c1 = _mm_set1_pd (1.4426950408889634073599);
    __m128d c2 = _mm_set1_pd (6.93145751953125E-1);
    __m128d c3 = _mm_set1_pd (1.42860682030941723212E-6);
    __m128i c1023 = _mm_set1_epi32(1023);
    __m128d c4 = _mm_set1_pd (0.5);

    /* n = round(x / log 2) --------------------- */
    // a = c1 * x + 0.5; 
    a = _mm_mul_pd (c1, x);
    a = _mm_add_pd (a, c4);

    // n = (int)a; 
    n = _mm_cvtpd_epi32(a);

    n -= (a < 0); ?? 

    /* x -= n * log2 ---------------------- */
    //px = (double)n;
    px = _mm_cvtepi32_pd(n);
    //x -= px * c2;
    x = _mm_sub_pd(x, _mm_mul_pd(px, c2));
    //x -= px * c3;
    x = _mm_sub_pd(x, _mm_mul_pd(px, c3));

    // calc e^x -------------------
       ...
    // ----------------------------

    /* 2^n in double. */
    n = _mm_add_(n, c1023);

    u.s[3] = (unsigned short)((n << 4) & 0x7FF0); ??

    return d[0] * u.d;
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Juha P
  • 295
  • 1
  • 6
  • 1
    Welcome to Stack Overflow. Please read [the help pages](http://stackoverflow.com/help), take [the SO tour](http://stackoverflow.com/tour), read about [how to ask good questions](http://stackoverflow.com/help/how-to-ask), as well as [this question checklist](https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/). Lastly learn how to create a [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve). – Some programmer dude Sep 17 '18 at 06:15
  • 3
    And you say that you are "stuck with a couple of line", but stuck *how*? What is the problem you have? – Some programmer dude Sep 17 '18 at 06:16
  • If you're implementing a 2^n or log2(x) function, say so and show the whole scalar code, and the intrinsics code you have so far, instead of just showing this tiny fragment. You don't even need to compare to get `a<0` or `n<0`, you can just arithmetic right shift to broadcast the sign bit. – Peter Cordes Sep 17 '18 at 06:21
  • @Peter Cordes, I'm noob with math and beginner with C++ . – Juha P Sep 17 '18 at 10:59
  • See [Fastest Implementation of Exponential Function Using SSE](https://stackoverflow.com/q/47025373). That's for `float`, not `double`. See also [Efficient implementation of log2(\_\_m256d) in AVX2](https://stackoverflow.com/q/45770089) which does do stuff with `double`, but is mostly about `log`, not `exp`. – Peter Cordes Sep 17 '18 at 11:13
  • Have you tried just looking for a vectorized `exp(x)` function? There are some. Anyway, see links in https://stackoverflow.com/tags/sse/info for guides. For the "compare", you just want `_mm_srai_epi32(n, 31)` for an arithmetic right shift, and `_mm_add_epi32` that 0 or -1 value. – Peter Cordes Sep 17 '18 at 11:14
  • @Peret Cordes, Thank you! Actually, I have managed to program quite fast `float` exp(x) function based on Maclaurin series expansion (depends on the need but for me it's also quite accurate when implemented as `double` ). It uses SSE ability to calculate two or four 'columns' at a time. This (remez based) exp(x) I'm trying to port now is quite accurate as `double` precicion implementation so, I would like to see how fast it would be compared to those I've found here from and there from so far. – Juha P Sep 17 '18 at 12:04
  • Hmm..., maybe vectorization isn't the way to go with exp() approximation - http://www.nersc.gov/users/computational-systems/edison/programming/vectorization/ – Juha P Sep 18 '18 at 16:14

0 Answers0