0

I want to implement 5%3=2 by avx.

normal: int64 x = a % b

math: int64 x = a-((double)a/b)*b

avx:
__m256 tmp1 = _mm256_cvtepi32_ps(data);

I need convert a big number from int64 to float, then float overflow.

how to solve overflow when convert int64 to float?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Songmeng
  • 46
  • 4
  • 2
    `float` has a larger value-range than `int64_t`. The conversion can't actually overflow. (But you can't do it in one hardware instruction anyway except for scalar.) You can do int64_t to double ([How to efficiently perform double/int64 conversions with SSE/AVX?](https://stackoverflow.com/q/41144668)), and then double->float with an existing intrinsic. – Peter Cordes Mar 09 '21 at 03:12
  • 2
    If `b` is a compile-time constant and your types are 32-bit int, you can use a multiplicative inverse (e.g. with https://libdivide.com/) to take advantage of `_mm256_mul_epu32`. Do you actually *want* your result as a float, though? Or is that just to work around the lack of integer divide instructions? It's very unclear what incoming data format you actually have, because you talk about int64 and use double for scalar, but then use `_mm256_cvtepi32_ps` in your vector example (int32_t -> float, not int64_t -> double). – Peter Cordes Mar 09 '21 at 03:18

1 Answers1

1

In common case your way is wrong. 'double' type has 52-bit mantissa which obviously isn't sufficient to represent exactly an arbitrary 64-bit integer. You might get wrong results.

'long double' type would be suitable, but it's only avaliable with x87, not SIMD.

Akon
  • 335
  • 1
  • 11