AVX2 intrinsics replacements for hadd

Question

i'm using C++ AVX2 intrinsics to horizontally add up values.

I have a vector (_m256i) with 3 values in it. i can use 2 _mm256_hadd_epi32 functions to add them together, however it is requested of me to find a way to horizontally add without using this.

my thought is to some how split them into 3 vectors each containing 1 value then to _mm256_add_epi32 them.

any suggestions on what function to use?

the aim is to increase the efficiancy. i'd really appreciate all suggestions! thankyou

Can you elaborate on the underlying reason why you want to avoid this particular function? Otherwise people might waste time suggesting substitutes that you can't use due to the same underlying reason. — Nate Eldredge, May 22 '20 at 19:53
Where are your 3 values located within that 8x 32-bit element `__m256i`? Is it actually 3x 64-bit elements? In general you never want `hadd` for horizontal sums within a single vector. See [Fastest way to do horizontal SSE vector sum (or other reduction)](https://stackoverflow.com/q/6996764) for how to shuffle and add. It will take at worst 2 shuffles and 2 adds to reduce 3 arbitrarily-placed elements to one sum, but if you're lucky you won't need a vector constant as a shuffle-control. — Peter Cordes, May 22 '20 at 20:17
Without more details on where the elements you care about are, the general case answer on [Fastest method to calculate sum of all packed 32-bit integers using AVX512 or AVX2](https://stackoverflow.com/q/60108658) should be straightforward to adapt. It shows how to shuffle elements between additions with `__m256i` vectors. — Peter Cordes, May 22 '20 at 23:35

AVX2 intrinsics replacements for hadd

0 Answers0