1

according to the reference here the following functions should be defined in "immintrin.h"

__m128i _mm_idiv_epi32 (__m128i a, __m128i b);
__m128i _mm_idivrem_epi32 (__m128i * mem_addr, __m128i a, __m128i b);
__m128i _mm_set_epi32 (int e3, int e2, int e1, int e0);

But according to my test program, they are not:

#include "immintrin.h"

int main() {
  __m128i a = _mm_set_epi32(4,3,2,1);
  __m128i b = _mm_set_epi32(1,2,3,4);
  __m128i c = _mm_idiv_epi32(a,b);
  __m128i d;

  c = _mm_idivrem_epi32(&d, a, b);
}

This fails to compile with the following error message:

cc -g scratch.c && ./a.out
scratch.c: In function 'main':
scratch.c:11:15: warning: implicit declaration of function '_mm_idiv_epi32'; did you mean '_mm_rorv_epi32'? [-Wimplicit-function-declaration]
   __m128i c = _mm_idiv_epi32(a,b);
               ^~~~~~~~~~~~~~
               _mm_rorv_epi32
scratch.c:11:15: error: incompatible types when initializing type '__m128i {aka __vector(2) long long int}' using type 'int'
scratch.c:14:7: warning: implicit declaration of function '_mm_idivrem_epi32'; did you mean '_mm_movm_epi32'? [-Wimplicit-function-declaration]
   c = _mm_idivrem_epi32(&d, a, b);
       ^~~~~~~~~~~~~~~~~
       _mm_movm_epi32
scratch.c:14:5: error: incompatible types when assigning to type '__m128i {aka __vector(2) long long int}' from type 'int'
   c = _mm_idivrem_epi32(&d, a, b);

Aparently the functions are not defined at all. So what is it then that I am doing wrong? Did I miss something?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Arne
  • 7,921
  • 9
  • 48
  • 66
  • 2
    You may want to check the intrinsics include of your compiler, as well as the hardware you’re targeting to compile. – Jens Mar 23 '18 at 01:02
  • For what it's worth, this works in ICC 17 and 18, so it is probably related to something gcc specific (wild guess: cc -mavx -msse or similar command line switch or different include name) – visibleman Mar 23 '18 at 01:24
  • 1
    x86 doesn't have SIMD integer division instructions, only SIMD floating-point division. (And scalar integer and FP division). `_mm_idivrem_epi32` isn't an intrinsic, it's an Intel library function. Note that it's listed as an SVML function, not part an instruction-set, and no single asm instruction is listed in the description. – Peter Cordes Mar 23 '18 at 02:46
  • 1
    To divide by a compile-time constant vector, use the multiplicative inverse trick, either manually ([like I did using GNU C native vectors to get the compiler to do it for me, in `vec_store_digit_and_space`](https://unix.stackexchange.com/questions/323845/whats-the-fastest-way-to-generate-a-1-gb-text-file-containing-random-digits/324520#324520), or using http://libdivide.com/ (can also work for runtime variables). – Peter Cordes Mar 23 '18 at 03:44
  • 2
    If your divisors aren't constant, and integers are smaller than 2^24 (or rounding is ok), convert to float and use SIMD FP division. For a single constant integer, see https://stackoverflow.com/questions/16822757/sse-integer-division (although with AVX2 variable-shift instructions, you can do different shifts for different elements and maybe make the integer formula work for a constant vector of different divisors) – Peter Cordes Mar 23 '18 at 03:52
  • 2
    For 16-bit integers, there are multiply instructions that take the high half, so you can do [very fast approximate division with `mulhrs_epi16`](https://stackoverflow.com/questions/42442325/how-to-divide-a-m256i-vector-by-an-integer-variable), or exact for all inputs with [the full multiplicative inverse trick with shifts](https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a). See also [How to let GCC compiler turn variable-division into mul(if faster)](https://stackoverflow.com/questions/36832440/how-to-let-gcc-compiler-turn-variable-division-into-mulif-faster) – Peter Cordes Mar 23 '18 at 04:00

1 Answers1

2

Your code compiles fine with a recent version Intel's ICC compiler. The function _mm_idiv_epi32 is an SVML instruction. The SVML library comes bundled with the Intel ICC compiler. If you don't have access to or can't use ICC, one way to obtain a linkable SVML might be by installing and linking to OpenCL.

visibleman
  • 3,175
  • 1
  • 14
  • 27