How to convert int64_t = float * int64_t scalar to vector code and back?

Question

I'd like to convert this scalar code:

int64_t res = floatValue * int64Value;

using SSE/SIMD (built with -march=nocona), and later back the value to float:

float finalRes = res;

Is it possible? I would do somethings like this:

__m128 res = _mm_mul_ps(floatValue4, int64Value4);
__m128i res1 = _mm_cvttps_epi64(res);
__m128i res2 = _mm_cvttps_epi64(_mm_movehl_epi64(res, res));

but it seems I can't find neither _mm_cvttps_epi64 or _mm_movehl_epi64 for the target platform.

Not sure what are your expectation. Did you try see what compiler generates: https://godbolt.org/z/od7WMoYTG ? From my point of view looks fine. — Marek R, Apr 16 '21 at 10:36
@Marek R what do you mean by "fine"? You show me "scalar" code. I need vector one :) — markzzz, Apr 16 '21 at 10:46
https://stackoverflow.com/questions/41144668/how-to-efficiently-perform-double-int64-conversions-with-sse-avx — nemequ, Apr 16 '21 at 11:38
@nemequ int64_to_double_full needs _mm_blend_epi16, which is SSE4.1 (and so, don't match my arch target) — markzzz, Apr 16 '21 at 13:22
`_mm_blendv_epi16` is easy to emulate on SSE2; instead of `_mm_blendv_epi16(x, 0x88)`, try something like `__m128i tmp = _mm_set_epi16(0, ~0, ~0, ~0, 0, ~0, ~0, ~0); __m128i xL = _mm_or_si128(_mm_and_si128(m, x), _mm_andnot_si128(m, _mm_castpd_si128(_mm_set1_pd(0x0010000000000000))));`. For 0x33 it's even easier since one of the vectors is all zeros: `_mm_and_si128(xH, _mm_set_epi16(~0, ~0, 0, 0, ~0, ~0, 0, 0))`. — nemequ, Apr 16 '21 at 17:30
Are you sure you need 64-bit integers? In your last question ([How to convert/merge two double (m128d) into one single (m128)?](https://stackoverflow.com/q/67111388)) your integer constant converted to a compile-time-constant float, which is a lot more efficient than anything you can do with int64. Until AVX-512, there's no single-instruction FP<->int64 SIMD conversion (only scalar), and no int64 SIMD multiply. — Peter Cordes, Apr 16 '21 at 22:26
What you're looking for with `_mm_movehl_epi64(v,v)` is just `_mm_unpackhi_epi64(v,v)`, or if you only want the low half of the result then the `_mm_srli_si128(v, 8)` you were already using also works. — Peter Cordes, Apr 16 '21 at 22:29
@PeterCordes the original code use int64_t. Thats because the integrators need that headroom (relative to the settings you place). In loop, both integrators and combs get sum/sub, right? — markzzz, Apr 17 '21 at 06:42
I don't really grok the overall algorithm of this code; not one I'm familiar with and you didn't describe it. Can't you use `double` instead, though? Scaling by 2^32 is fine for `double`. (Or even `float`; it has enough exponent range, but IDK if it has enough mantissa precision for you). OTOH, for that serial dependency where you update `s -= pCombs[i]`, integer is nice because it's lower latency for the loop-carried dep chain. Same for the prefix-sum with `pIntegrators`. You'd have to unroll more to hide `double` add latency. int64_t add/sub is fine with SIMD, but conversion sucks. — Peter Cordes, Apr 17 '21 at 06:57
@PeterCordes its a fir CIC filter (decimator); more about it https://www.dsprelated.com/showarticle/1337.php — markzzz, Apr 17 '21 at 07:14

How to convert int64_t = float * int64_t scalar to vector code and back?

0 Answers0