4

What is the best way to multiply each 32bit entry of two _mm256i registers with each other?

_mm256_mul_epu32 is not what I'm looking for because it produces 64bit outputs. I want a 32bit result for every 32bit input element.

Moreover, I'm sure that the multiplication of two 32bit values will not overflow.

Thanks!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user1829358
  • 1,041
  • 2
  • 9
  • 19
  • Possible duplicate of [fastest way to multiply two vectors in c++](http://stackoverflow.com/questions/17264399/fastest-way-to-multiply-two-vectors-in-c) – Peter Cordes Jun 09 '16 at 06:18

1 Answers1

7

You want the _mm256_mullo_epi32() intrinsic. From Intel's excellent online intrinsics guide:

Synopsis

__m256i _mm256_mullo_epi32 (__m256i a, __m256i b)
#include "immintrin.h" 
Instruction: vpmulld ymm, ymm, ymm CPUID Flags: AVX2 

Description

Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst.

Jason R
  • 11,159
  • 6
  • 50
  • 81