AVX2 equivalent of lrintf

Question

I currently have a simple C loop that converts an array from float to int using lrintf, using the default rounding strategy. I would like instead to put this into my AVX2 routine - is there an equivalent command to lrintf using SIMD ? Btw, after the lrintf, I clamp the result to user-specified min and max.

Thanks!

The prototype for this function is `long int lrintf (float);`. In your programming environment, is `long int` a 32-bit type or a 64-bit type? — njuffa, Sep 01 '20 at 03:05
If `long int` is a 32-bit type, `__m256i _mm256_cvtps_epi32 (__m256 a)` should give you what you need. — njuffa, Sep 01 '20 at 06:07
For packed conversion from double->int64_t without AVX512, see [How to efficiently perform double/int64 conversions with SSE/AVX?](https://stackoverflow.com/q/41144668) — Peter Cordes, Sep 01 '20 at 06:24

njuffa · Accepted Answer · 2020-09-01T23:25:48.123

While the prototype of lrintf() is long int lrintf (float);, OP clarified in comments that they are looking for a conversion from float to 32-bit int.

The AVX intrinsic _mm256_cvtps_epi32 is the perfect fit for this: It provides a conversion from float to 32-bit int using the current rounding mode, which defaults to round-to-nearest-even for all of the software environments I am familiar with.

The output from the little test program below should look as follows:

source vector:  1.000000  1.100000  1.500000  1.900000 -1.000000 -1.100000 -1.500000 -1.900000
round to nearest:   1  1  2  2 -1 -1 -2 -2
round down:         1  1  1  1 -1 -2 -2 -2
round up:           1  2  2  2 -1 -1 -1 -1
round toward zero:  1  1  1  1 -1 -1 -1 -1

I note, however, that choosing any optimization level above -O0 gives incorrect results with my older Intel compiler, presumably because the compiler moves the __MM_SET_ROUNDING_MODE() instances around, not respecting the implicit dependencies between those and surrounding computation. Not sure what to do about that. I already compiled with the strictest floating-point settings and also tried adding #include <fenv.h> followed by #pragma STDC FENV_ACCESS ON, to no avail.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
#include "immintrin.h"

int main (void)
{
    __m256 float_vec;
    __m256i int_vec_rn;
    __m256i int_vec_rd;
    __m256i int_vec_ru;
    __m256i int_vec_rz;
    float arg[8] = {1.0f, 1.1f, 1.5f, 1.9f, -1.0f, -1.1f, -1.5f, -1.9f};
    int32_t res[8];

    unsigned int old_rm = _MM_GET_ROUNDING_MODE();
    printf ("source vector: % f % f % f % f % f % f % f % f\n",
            arg[0], arg[1], arg[2], arg[3], arg[4], arg[5], arg[6], arg[7]);
    memcpy (&float_vec, arg, sizeof float_vec);

    _MM_SET_ROUNDING_MODE (_MM_ROUND_NEAREST);
    int_vec_rn = _mm256_cvtps_epi32 (float_vec);
    memcpy (res, &int_vec_rn, sizeof res);
    printf ("round to nearest:  % d % d % d % d % d % d % d % d\n", 
            res[0], res[1], res[2], res[3], res[4], res[5], res[6], res[7]);

    _MM_SET_ROUNDING_MODE (_MM_ROUND_DOWN);
    int_vec_rd = _mm256_cvtps_epi32 (float_vec);
    memcpy (res, &int_vec_rd, sizeof res);
    printf ("round down:        % d % d % d % d % d % d % d % d\n", 
            res[0], res[1], res[2], res[3], res[4], res[5], res[6], res[7]);

    _MM_SET_ROUNDING_MODE (_MM_ROUND_UP);
    int_vec_ru = _mm256_cvtps_epi32 (float_vec);
    memcpy (res, &int_vec_ru, sizeof res);
    printf ("round up:          % d % d % d % d % d % d % d %d\n", 
            res[0], res[1], res[2], res[3], res[4], res[5], res[6], res[7]);

    _MM_SET_ROUNDING_MODE (_MM_ROUND_TOWARD_ZERO);
    int_vec_rz = _mm256_cvtps_epi32 (float_vec);
    memcpy (res, &int_vec_rz, sizeof res);
    printf ("round toward zero: % d % d % d % d % d % d % d % d\n", 
            res[0], res[1], res[2], res[3], res[4], res[5], res[6], res[7]);

    _MM_SET_ROUNDING_MODE (old_rm);
    return EXIT_SUCCESS;
}

You normally need `#pragma STDC FENV_ACCESS ON` to tell the compiler that constant-propagation at compile time must respect changes to the FP rounding mode. I forget if that actually works properly in GCC and/or clang; I seem to recall it might not (e.g. [pragma STDC FENV\_ACCESS ON is not supported](https://stackoverflow.com/q/33471254)). Also remember that Intel's compiler defaults to something like `-ffast-math`. — Peter Cordes, Sep 01 '20 at 22:51
@PeterCordes I know that this applies to the facilities provided by `fenv.h`, I don't know whether there is any interaction with AVX intrinsics. though (my hunch is: no). I will run some quick experiments. I did compile with strict floating-point settings, which is my personal default. — njuffa, Sep 01 '20 at 22:59
@njuffa brilliant, thank you for such a comprehensive answer to my question — Jacko, Sep 01 '20 at 23:01
also, I can confirm that results are consistent with optimized gcc build: all my unit tests pass both in debug and optimized builds — Jacko, Sep 01 '20 at 23:58

AVX2 equivalent of lrintf

1 Answers1