AVX2/AVX512 for modulo operations?

Question

I'm experimenting AVX instructions for ECDSA. I'm wondering if AVX2/AVX512 can be leveraged to perform modulo operations? If so, how?

Thanks

there's [libdivide](https://libdivide.com/) that targets fast SIMD division, but if you have lots of non-duplicating divisors then probably it can't help much — phuclv, Feb 10 '23 at 00:48
IDK if there's a good way to use floating point for exact integer division. The obvious thing is to limit things to a 52-bit range so every integer is exactly representable, and convert to `double` and back, with the conversion back using truncation. (Or truncate to an integer-valued `double` with `vroundpd` or the AVX-512 equivalent, so you can do `vfmsub...pd` to get the remainder.) But that would be for many 64-bit modulos in parallel, not one bigint modulo. Since you mentioned ECDSA, I think you want one big one? Multi-precision divide usually doesn't use hardware div insns. — Peter Cordes, Feb 10 '23 at 01:37
For SIMD bigint stuff, state of the art is to use partial-word arithmetic anyway even for add/sub/mul so you can defer normalization / carry propagation, and do it with shifts and shuffles. [Can long integer routines benefit from SSE?](https://stackoverflow.com/q/8866973) — Peter Cordes, Feb 10 '23 at 01:39
Popular ECDSA curves are chosen to make reduction fast (ie faster than Barrett reduction, which is the serious alternative, not so much "plain old modulo"), do you not have one of those? — harold, Feb 10 '23 at 04:40
Do you need integer modulo (with fixed or variable divisor?) or `fmod`? For `fmod`, do you need full precision on the full range? — chtz, Feb 10 '23 at 08:24

AVX2/AVX512 for modulo operations?

0 Answers0