x86 doesn't have native support for FP<->unsigned until AVX512, with vcvtps2udq
(https://www.felixcloutier.com/x86/vcvtps2udq). For scalar you normally just convert to 64-bit signed (cvtss2si rax, xmm0
) and take the low 32 bits of that (in EAX), but that's not an option with SIMD.
Without AVX-512, ideally you can use a signed conversion (cvtps2dq
) and get the same result. i.e. if your floats are non-negative and <= INT_MAX
(2147483647.0
).
See How to efficiently perform double/int64 conversions with SSE/AVX? for a related double->uint64_t conversion. The full-range one should be adaptable from double->uint64_t to float->uint32_t if you need it.
Another possibility (for 32-bit float->uint32_t) is just range-shifting to signed in FP, then flipping back in integer. INT32_MIN ^ convert(x + INT32_MIN)
. But that introduces FP rounding for small integers because INT32_MIN is outside the -224 .. 224 range where a float
can represent every integer. e.g. 5
would be rounded to the nearest multiple of 28 during conversion. So that's not usable; you'd need to try straight conversion and range-shifted conversion, and only use the range-shifted conversion if straight conversion gave you 0x80000000
. (Perhaps using the straight conversion result as a blend control for SSE4 blendvps
?)
For packed conversion of float->int32_t, there is SSE2 cvtps2dq xmm, xmm/m128
docs. (cvttps2dq
converts with truncation toward 0, instead of the current default rounding mode (nearest, if you haven't changed it).)
Any negative float less than -0.5 will convert to integer -1 or lower; as an uint32_t
that bit-pattern represents a huge number. Floats outside the -231..231-1 range get converted to 0x80000000
, Intel's "integer indefinite" value.
If you didn't find that, only cvtps2pi signed conversion into an MMX register, you need better places to search: