1

Is there a well-known and efficient method for vectorizing multiplications of (two arrays of) unsigned 64-bit integers yielding 128-bit integers?

I found this thread which only talks about using a single instruction though.

aqjune
  • 478
  • 1
  • 3
  • 17
  • 4
    Signed or unsigned 64-bit integers? And for which architecture, any one particular? There are the GMP `mpn` [functions](https://gmplib.org/manual/Low_002dlevel-Functions) which do that very efficiently for unsigned ints. – Arc Jan 12 '22 at 07:03
  • 3
    Do you have two arrays of 64bit integers and want an array of 128bit integers? I doubt there is an efficient AVX2 solution for this. If you had AVX-512IFMA52, you could do 52x52 bit multiplications. You can also try to emulate that using FMA with a lot of bit-twiddling -- really depends on the overall thing you want to do, whether this is worth the effort. – chtz Jan 12 '22 at 09:39
  • 3
    Related: [Fastest way to multiply an array of int64\_t?](https://stackoverflow.com/q/37296289) for vectorizing 64x64 => 64-bit multiply over arrays. There's a small win without AVX-512 (which brings native 64x64 => 64-bit). But where you need the full 128-bit result, very likely scalar is the fastest way. Building a 128-bit result out of `vpmuludq` chunks (and `vpmulld` for the high x high part) would be hard and probably require an add-with-carry, which AVX2 doesn't have. – Peter Cordes Jan 12 '22 at 13:05

0 Answers0