i'm using C++ AVX2 intrinsics to horizontally add up values.
I have a vector (_m256i) with 3 values in it. i can use 2 _mm256_hadd_epi32 functions to add them together, however it is requested of me to find a way to horizontally add without using this.
my thought is to some how split them into 3 vectors each containing 1 value then to _mm256_add_epi32 them.
any suggestions on what function to use?
the aim is to increase the efficiancy. i'd really appreciate all suggestions! thankyou