Vectorization of multiplying two unsigned 64-bit integers with AVX2?

Question

Is there a well-known and efficient method for vectorizing multiplications of (two arrays of) unsigned 64-bit integers yielding 128-bit integers?

I found this thread which only talks about using a single instruction though.

Signed or unsigned 64-bit integers? And for which architecture, any one particular? There are the GMP `mpn` [functions](https://gmplib.org/manual/Low_002dlevel-Functions) which do that very efficiently for unsigned ints. — Arc, Jan 12 '22 at 07:03
Do you have two arrays of 64bit integers and want an array of 128bit integers? I doubt there is an efficient AVX2 solution for this. If you had AVX-512IFMA52, you could do 52x52 bit multiplications. You can also try to emulate that using FMA with a lot of bit-twiddling -- really depends on the overall thing you want to do, whether this is worth the effort. — chtz, Jan 12 '22 at 09:39
Related: [Fastest way to multiply an array of int64\_t?](https://stackoverflow.com/q/37296289) for vectorizing 64x64 => 64-bit multiply over arrays. There's a small win without AVX-512 (which brings native 64x64 => 64-bit). But where you need the full 128-bit result, very likely scalar is the fastest way. Building a 128-bit result out of `vpmuludq` chunks (and `vpmulld` for the high x high part) would be hard and probably require an add-with-carry, which AVX2 doesn't have. — Peter Cordes, Jan 12 '22 at 13:05

0 Answers0