Multiply two vectors of 32bit integers, producing a vector of 32bit result elements

Question

What is the best way to multiply each 32bit entry of two _mm256i registers with each other?

_mm256_mul_epu32 is not what I'm looking for because it produces 64bit outputs. I want a 32bit result for every 32bit input element.

Moreover, I'm sure that the multiplication of two 32bit values will not overflow.

Thanks!

Possible duplicate of [fastest way to multiply two vectors in c++](http://stackoverflow.com/questions/17264399/fastest-way-to-multiply-two-vectors-in-c) — Peter Cordes, Jun 09 '16 at 06:18

score 7 · Accepted Answer · answered Feb 12 '15 at 14:19

7

You want the _mm256_mullo_epi32() intrinsic. From Intel's excellent online intrinsics guide:

Synopsis
__m256i _mm256_mullo_epi32 (__m256i a, __m256i b)
#include "immintrin.h" 
Instruction: vpmulld ymm, ymm, ymm CPUID Flags: AVX2 
Description

Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst.

answered Feb 12 '15 at 14:19

Jason R

11,159
6
50
81

Do you happen to know the AVX equivalent? – Bram May 29 '16 at 02:43
1

@Bram There isn't one. AVX doesn't have integer instructions; they were added in AVX2. You'll have to use the SSE equivalents. – Jason R May 29 '16 at 15:22

Multiply two vectors of 32bit integers, producing a vector of 32bit result elements

1 Answers1

Synopsis

Description

Linked