AVX/SSE round floats down and return vector of ints?

Question

Is there a way using AVX/SSE to take a vector of floats, round-down and produce a vector of ints? All the floor intrinsic methods seem to produce a final vector of floating point, which is odd because rounding produces an integer!

Conversion to integer must round or truncate somehow, but you can round an FP value to the nearest integer without converting. (See [`float nearbyintf(float x)`](http://en.cppreference.com/w/c/numeric/math/nearbyint)) — Peter Cordes, May 07 '16 at 18:16
Definition of `round()` - https://stackoverflow.com/questions/3597197. — Royi, Apr 06 '18 at 23:07
@Royi: can you please delete your misleading comment? There are no x86 intrinsics for the C `round()` rounding mode, which rounds +-0.5 away from zero. x86 rounding intrinsics that use round-to-nearest use the IEEE default rounding mode, banker's rounding (nearest-even as a tiebreak.) `rint` / `lrint` / `nearbyint` all give you the default rounding mode, and are faster than `round()` on x86 (especially with SSE4.1 or AVX, or any time when you're converting to integer). So linking to a definition of `round()` is off topic. — Peter Cordes, Jan 26 '19 at 23:46
@PeterCordes, I linked to the answer which defines `round()` as in Wikipedia. I didn't get your comment, does `round()` in `C` obeys this definition or not? — Royi, Jan 27 '19 at 03:34
This question is about rounding in general, no the C `round()` function. And specifically rounding *down* (floor or trunc). Nowhere does the question say anything about rounding functions that use the rounding mode of the C `round()` function. It's not the default IEEE rounding mode or anything. It's maybe interesting to mention it in an answer as an alternative to `_mm_round_ps(x, _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC)`, if you're going to talk about other rounding modes. — Peter Cordes, Jan 27 '19 at 04:03

score 5 · Answer 1 · edited Jan 27 '19 at 04:20

SSE has conversion from FP to integer with your choice of truncation (towards zero) or the current rounding mode (normally the IEEE default mode, nearest with tiebreaks rounding to even. Like nearbyint(), unlike round() where the tiebreak is away-from-0. If you need that rounding mode on x86, you have to emulate it, perhaps with truncate as a building block.)

The relevant instructions are CVTPS2DQ and CVTTPS2DQ to convert packed single-precision floats to signed doubleword integers. The version with the extra T in the mnemonic does Truncation instead of the current rounding mode.

; xmm0 is assumed to be packed float input vector
cvttps2dq xmm0, xmm0
; xmm0 now contains the (rounded) packed integer vector

Or with intrinsics, __m128i _mm_cvt[t]ps_epi32(__m128 a)

For the other two rounding modes x86 provides in hardware, floor (toward -Inf) and ceil (toward +Inf), a simple way would be using this SSE4.1/AVX ROUNDPS instruction before converting to integer.

The code would look like this:

roundps  xmm0, xmm0, 1    ; nearest=0, floor=1,  ceil=2, trunc=3
cvtps2dq xmm0, xmm0       ; or cvttps2dq, doesn't matter
; xmm0 now contains the floored packed integer vector

For AVX ymm vectors prefix the instructions with 'V' and change the xmm's to ymm's.

ROUNDPS works like this

Round packed single precision floating-point values in xmm2/m128 and place the result in xmm1. The rounding mode is determined by imm8.

the rounding mode (the immediate/the third operand) can have the following values (taken from table 4-15 - Rounding Modes and Encoding of Rounding Control (RC) Field of the current Intel Docs):

Rounding Mode               RC Field Setting   Description
----------------------------------------------------------
Round to nearest (even)     00B                Rounded result is the closest to the infinitely precise result. If two values are equally close, the result is nearest (even) the even value (i.e., the integer value with the least-significant bit of zero).
Round down (toward −∞)      01B                Rounded result is closest to but no greater than the infinitely precise result.
Round up (toward +∞)        10B                Rounded result is closest to but no less than the infinitely precise result.
Round toward 0 (truncate)   11B                Rounded result is closest to but no greater in absolute value than the infinitely precise result.

The probable reason why the return vector of the rounding operation is float and not int may be that in this way the further operations could always be float operations (on rounded values) and a conversion to int would be trivial as shown.

The corresponding intrinsics are found in the referenced docs. An example of transforming the above code to intrinsics (which depend on the Rounding Control (RC) Field) is:

__m128 dst = _mm_cvtps_epi32( _mm_floor_ps(__m128 src) );

`roundps` is only needed if you need to round towards -Inf (floor) or +Inf (ceil). `cvtps_epi32` uses the default rounding mode (normally the same as `roundps 00`), while `cvttps_epi32` truncates towards 0. — Peter Cordes, May 07 '16 at 18:28
One major reason for `roundps` not converting to `int` is that `FLT_MAX` is greater than `INT_MAX`, so the result might not be representable (you get `0x80000000` from converting out-of-range FP values, the "integer indefinite" value as Intel calls it). Also, it's just plain useful to round FP numbers sometimes even without converting them to integer. — Peter Cordes, May 07 '16 at 18:30
A better example for `roundps` would be floor or ceil mode, which aren't achievable with cvt or cvtt instructions with the default rounding mode. Truncate and then convert-to-nearest should just be done with a single cvtt instruction, so it doesn't show what `roundps` is useful for in this case. — Peter Cordes, May 07 '16 at 19:47

doug65536 · Answer 2 · 2016-05-07T17:35:37.507

1

Use the conversion instructions:

int _mm_cvt_ss2si (__m128 a)

Converts the low 32-bit floating-point component of a to integer and return that integer. The upper three components of a are ignored.

__m128i _mm_cvtps_epi32 (__m128 a);

Convert all four 32-bit floats to integers and return vector of 4 32-bit integers.

Those are the frequently used ones. There are additional variations to handle conversions.

edited May 07 '16 at 17:35

answered May 07 '16 at 17:29

doug65536

6,562
3
43
53

4

The OP wants a result that's rounded down. Assuming he means towards 0, rather than towards -Inf, `_mm_cvttps_epi32` is the intrinsic for ideal instruction. For rounding towards -Inf, you'd need `roundps`, or to change the default rounding mode temporarily. – Peter Cordes May 07 '16 at 18:24
@PeterCordes I'm not sure what you mean but if I have 4.9 I want 4 and if I have 100.3434 I want 100? – intrigued_66 May 07 '16 at 18:30
2

@mezamorphic: If you have `-4.9`, do you want `-5` (floor) or `-4` (truncate)? – Peter Cordes May 07 '16 at 18:31
We can assume I won't have minus numbers. Speed is extremely important (if that helps). – intrigued_66 May 07 '16 at 18:40
2

@mezamorphic: in that case the solution provided by Peter Cordes would be the best fit: using `_mm_cvttps_epi32`/`CVTTPS2DQ`. – zx485 May 07 '16 at 18:50

score 1 · Answer 3 · answered May 07 '16 at 20:00

1

Single-instruction options:

truncate towards zero: __m128i _mm_cvttps_epi32(__m128 a)
round to nearest: __m128i _mm_cvtps_epi32(__m128 a)

Two instructions, using SSE4.1 ROUNDPS and then cvtps_epi32

Round towards -INF: __m128 _mm_floor_ps(__m128 s1)
Round towards +INF: __m128 _mm_ceil_ps(__m128 s1)

Only use the other truncate or nearest forms of roundps if you want to keep the data in FP format.

For positive numbers, truncation and floor are the same. For negative integers, cvtt(-4.9) = -4, but floor(-4.9) = -5.0. See floorf() vs. truncf().

If the FP value is outside the INT_MIN to INT_MAX range, cvttps and cvtps will give you 0x80000000 (i.e. INT_MIN, just the sign bit set), which Intel calls the "integer indefinite" value. It will also raise the FP invalid exception, but FP exceptions are masked by default.

answered May 07 '16 at 20:00

Peter Cordes

328,167
45
605
847

How does `_mm_cvtps_epi32` handle value of 0.5? – Royi Apr 06 '18 at 22:30
@Royi: the default IEEE754 rounding mode is [round-to-nearest, with even numbers as a tie-break](https://en.wikipedia.org/wiki/Rounding#Round_half_to_even). Thus, `lrint(0.5)` or `(int)nearbyint(0.5)`gives `0`, unless you've changed the MXCSR rounding mode. – Peter Cordes Apr 06 '18 at 23:14
So it is different than definition - https://stackoverflow.com/a/3597210/195787. Thank You! – Royi Apr 06 '18 at 23:16
@Royi: huh? Different rounding functions have different definitions. The standard function called `round()` in C uses its own special rounding mode. [I recommended `lrint` or `(int)nearbyint` for a reason](https://stackoverflow.com/questions/485525/round-for-float-in-c/47347224#47347224); because they use the current rounding mode, while `round` uses a fixed rounding mode that x86 doesn't support directly in hardware. – Peter Cordes Apr 06 '18 at 23:20

AVX/SSE round floats down and return vector of ints?

3 Answers3

Linked