Poor man's alternative to _mm_cvttpd_epi64

Question

On AXV512DQ, there is _mm_cvttpd_epi64, for example in file avx512vldqintrin.h we find

static __inline__ __m128i __DEFAULT_FN_ATTRS
_mm_cvttpd_epi64 (__m128d __A) {
  return (__m128i) __builtin_ia32_cvttpd2qq128_mask ((__v2df) __A,
               (__v2di) _mm_setzero_si128(),
               (__mmask8) -1);
}

which converts two packed 64-bit floats (__m128d) to two packed 64-bit integers (__m128i). There is also _mm256_cvttpd_epi64 for converting four packed 64-bit floats (__m256d) to four packed 64-bit integers (__m256i).

However, many machines do not support AXV512DQ. So I wonder what the best version of a poor man's alternative for this is.

I should say that I'm already happy with a solution that works only for 64-bit floats which can be loss-free converted to 32-bit floats.

Almost a duplicate: https://stackoverflow.com/questions/41144668/how-to-efficiently-perform-double-int64-conversions-with-sse-avx — Mysticial, Jun 12 '17 at 17:34
I didn't close this as a dupe since you seem to want truncation rounding. And I don't think there's a way to make my answer do that without adding significant overhead. — Mysticial, Jun 12 '17 at 17:49
@Mystical I need that for something like `_mm_exp_pd` and `_mm256_exp_pd`, which needs the computation of 2^n where n is the integer part of x/log(2). This is done using bit-wise manipulations and hence requires an integer argument. — Walter, Jun 12 '17 at 19:54
If you want to calculate exp, I would suggest rounding to the next int (e.g., using one of the linked methods) converting back to double, compute the diff which should be in [-0.5, +0.5]. Then use a min-max polynomial for exp which works on [-0.5, +0.5] — chtz, Jun 13 '17 at 20:27
Addendum: You may have problems near the borders of what can be calculated, which is less likely to happen if you truncate to zero. OTOH, if you truncate to 0, you have different remainders for positive or negative arguments — chtz, Jun 13 '17 at 20:33
Come to think of it, if you want truncation rounding, just issue a `_mm_round_pd(x, _MM_FROUND_TO_ZERO)` before applying the conversion trick. — Mysticial, Jun 19 '17 at 18:51
I looked at Agner Fog's VCL. He uses `_mm_cvttpd_epi64` if `__AVX512DQ__` is available otherwise he stores to a buffer and uses scalar operations and then reinserts. I guess that is what @Mysticial means by " I don't think there's a way to make my answer do that without adding significant overhead" — Z boson, Jul 03 '17 at 14:16

Poor man's alternative to _mm_cvttpd_epi64

0 Answers0