AVX512 Vectorizing Modulo Gives Negative Result For Very Large Inputs

Question

I am currently trying to vectorize a modulo calculation using AVX512. Because there is no AVX modulo operation (except SVML) or an integer division, I am using the following formula d % p = d - int(float(d)/float(p))*p.

However, for very large inputs, I get negative results.

#include <bits/stdc++.h>
#include <immintrin.h>

int main() {

    const auto SIZE = 1024;
    int64_t input[SIZE];
    int64_t output[SIZE] = {};

    const auto p = 1'536; // 1.5 * 1024

    std::iota(input, input + SIZE, 15596705878733779060ULL);

    __m512i _divider_512 = _mm512_set1_epi64(p);

    for (size_t idx = 0; idx < SIZE; idx += 8) {
        __m512i _inputs = _mm512_loadu_si512(&input[idx]);
        __m512i _e = _mm512_cvt_roundpd_epi64(_mm512_div_pd(_mm512_cvtepi64_pd(_inputs), _mm512_cvtepi64_pd(_divider_512)), _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);
        __m512i _mult = _mm512_mullo_epi64(_e, _divider_512);
        __m512i _modulo_result = _mm512_sub_epi64(_inputs, _mult);
        _mm512_storeu_si512(&output[idx], _modulo_result);
    }

    for (auto i = 0; i< SIZE; ++i) {
       std::cout << output[i] << std::endl;
    }

}

Is this a rounding error between ints and doubles? Or am I missing something else?

And how could I workaround it?

It looks like your input is already to large to convert to `double` without rounding errors (`double` only has 53 bits of precision). — chtz, Sep 02 '22 at 07:57
I see, thanks. Is there any way to work around this limitation in the above-mentioned case? — InvisibleShadowGhost, Sep 02 '22 at 09:48
You can apply another mod-operation on the result. If `p` is sufficiently large, you may only need to check once if your result is smaller than 0 or larger/equal to `p`. If `p` is known at compile-time or you use the same `p` for many mod-operations, you may also use the multiplicative inverse (compilers do this automatically: https://godbolt.org/z/Mq6ejTW6j) — chtz, Sep 02 '22 at 10:43
Yeah, this solves the problem with negative results but not with incorrect ones, right? As can be seen in the output, for the firs input (15596705878733779060 % 1536) it returns 1140, but should correctly be 628). — InvisibleShadowGhost, Sep 02 '22 at 11:22
[Why should I not `#include `?](https://stackoverflow.com/q/31816095/995714) — phuclv, Sep 02 '22 at 12:06
Your input number is too large for a signed `int64_t`. You need `_mm512_cvtepu64_pd` if your input is to be interpreted as unsigned. Technically, assigning too large numbers to signed integer types already induces undefined behavior. — chtz, Sep 02 '22 at 15:48

AVX512 Vectorizing Modulo Gives Negative Result For Very Large Inputs

0 Answers0