The intrinsic function _mm256_exp_ps() is not working any more( x86intrin.h)

Question

I have an old implementation that used the _mm256_exp_ps() function, and I could compile them with GCC, ICC, and Clang; Now, I cannot compile the code anymore because the compiler does not find the function _mm256_exp_ps().

Here is the simplified version of my problem:

#include <stdio.h>
#include <x86intrin.h>

int main()
{
    __m256 vec1, vec2;
    vec2 = _mm256_exp_ps(vec1);

    return 0;
}

And the error is:

$ gcc -march=native  temp.c -o temp
temp.c: In function ‘main’:
temp.c:9:16: warning: implicit declaration of function ‘_mm256_exp_ps’; did you mean ‘_mm256_rcp_ps’? [-Wimplicit-function-declaration]
    9 |         vec2 = _mm256_exp_ps(vec1);
      |                ^~~~~~~~~~~~~
      |                _mm256_rcp_ps
temp.c:9:16: error: incompatible types when assigning to type ‘__m256’ from type ‘int’

Which means the compiler cannot find the intrinsic.

If I use another function, for example, _mm256_add_ps(), there are no errors, which means the library is accessible; the problem is with _mm256_exp_ps() that might have been changed when they have added AVX512 support to the compiler.

#include <stdio.h>
#include <x86intrin.h>

int main()
{
    __m256 vec1, vec2;
    vec2 = _mm256_add_ps(vec1, vec2);

    return 0;
}

Could you please help me solve the problem?

The SVML library is proprietary. Have you tried compiling with ICC? See https://stackoverflow.com/questions/36636159/where-is-clangs-mm256-pow-ps-intrinsic — Simon Goater, Feb 03 '23 at 21:27
I think [this post](https://stackoverflow.com/q/36636159/10871073) about `_mm256_pow_ps` is relevant. I guess your `_mm256_exp_ps` is (similarly) not an *actual* intrinsic but part of the SVML library. — Adrian Mole, Feb 03 '23 at 21:27

Simon Goater · Answer 1 · 2023-02-04T11:33:45.337

As a workaround, which should hopefully allow you to compile and run your program, you could include a function yourself with the same name. If it is not a performance critical part of your program, it might be an acceptable fix. Below are SSE and AVX versions of the function.

#include <stdio.h>
#include <math.h>
#include <immintrin.h>
#include <xmmintrin.h>

// gcc Junk.c -o Junk.bin -mavx -lm
// gcc Junk.c -o Junk.bin -msse4 -lm

__m128 _mm128_exp_ps(__m128 invec) {
  float *element = (float *)&invec;  
  return _mm_setr_ps(
    expf(element[0]),
    expf(element[1]),
    expf(element[2]),
    expf(element[3])
    );
}
/*
__m256 _mm256_exp_ps(__m256 invec) {
  float *element = (float *)&invec;
  return _mm256_setr_ps(
    expf(element[0]),
    expf(element[1]),
    expf(element[2]),
    expf(element[3]),
    expf(element[4]),
    expf(element[5]),
    expf(element[6]),
    expf(element[7])
    );
}
*/
int main()
{
  __m128 vec1, vec2;
  vec1 = _mm_setr_ps( 1.0, 1.1, 1.2, 1.3);
  vec2 = _mm128_exp_ps(vec1);
  float *element = (float *)&vec2;
  int i;
  for (i=0; i<4; i++) {
      printf("%f %f\n", element[i], expf(1.0f + i/10.0f));
  }

  return 0;
}

EDIT:- After comments by Peter Cordes about possible undefined behaviour when setting a float pointer to a _mm128 or _mm256 variable, I thought I'd add a suggestion for maximum safety and portability taken from the suggestions in the links he provided. I don't know for sure that there is a problem with the above code due to alignment issues, but it appears that the more correct way to do this would be to replace the line

  float *element = (float *)&invec;

with

  float element[4];
  _mm_storeu_ps(element, invec);

and

  float element[8];
  _mm256_storeu_ps(element, invec);

for the SSE and AVX functions respectively.

`float *element = (float *)&invec;` is strict-aliasing UB, I think. Possibly ok in GNU C where `__m128` is `typedef float __m128 __attribute__((vector_size(16),may_alias))` since it's also a float pointer, but I'm not at all sure. Also, if the compiler doesn't call a vectorized `exp` function, you actually *want* it to spill to a temporary array and only reload the one float element, not tempt it into reloading the whole vector and shuffling between every call. [print a \_\_m128i variable](https://stackoverflow.com/a/46752535) shows how to access all elements of a vector portably. — Peter Cordes, Feb 04 '23 at 00:32
Anyway yes this works, but a manually vectorized `exp` function is not that hard; many implementations are floating around online with various speed vs. precision tradeoffs. (Especially if you don't need to handle NaNs, or maybe even ignoring subnormals). [Fastest Implementation of Exponential Function Using AVX](https://stackoverflow.com/q/48863719) has some good ones that are fast and e.g. wim's answer has relative error of about +-4e-8. AVX-512 `getexpps` / `getmantps` are quite useful for implementing exp/log. — Peter Cordes, Feb 04 '23 at 00:49
I did wonder about this when I wrote it, but I thought the invec variable would be aligned adequately. I don't really want to depend on alignas(). I'll add an edit with the suggestions from the links on how this should be done in the safest most portable way. — Simon Goater, Feb 04 '23 at 11:34
I said strict aliasing UB, not alignment. Those are two separate things. In your version using an array, yes, alignment becomes relevant because `alignof(float)` is less than `alignof(__m128)`. So yes, an unaligned store or `alignas(16) float element[4];` and `_mm_store_ps`. Some compilers will choose to align the array on their own to avoid possible cache-line splits for the store, not that it even matters since store-forwarding still works. — Peter Cordes, Feb 04 '23 at 16:18

The intrinsic function _mm256_exp_ps() is not working any more( x86intrin.h)

1 Answers1