I recently followed a post/ blog to find the least member of an array and it used 128 bit vector instructions. I followed the post and it ran fine, until I decided to write the same for 256 bit instruction set.
The code is as follows -
#include<iostream>
#include<random>
//#include <Eigen/Dense>
#include <immintrin.h>
#include <cstdlib>
#include <vector>
float min128_sse(float *a, int n) {
float res;
__m128 *simdVector = (__m128*) a;
__m128 maxval = _mm_set1_ps(UINT32_MAX);
for (int i = 0; i < n / 4; i++) {
maxval = _mm_min_ps(maxval, simdVector[i]);
}
maxval = _mm_min_ps(maxval, _mm_shuffle_ps(maxval, maxval, 0x93));
_mm_store_ss(&res, maxval);
return res;
}
float min256_sse(float *a, int n) {
float res;
__m256* simdVector = (__m256*) a;
__m256 minVal = _mm256_set1_ps(UINT32_MAX);
for (int i = 0; i < n / 8; i++) {
minVal = _mm256_min_ps(minVal, simdVector[i]);
}
minVal = _mm256_min_ps(minVal, _mm256_shuffle_ps(minVal, minVal, 0x93));
res = minVal[0];
std::cout<<res<<std::endl;
return res;
}
int main()
{
std::vector<float> givenVector{1.0, 2.0, 3.0, 4.0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, -1, -2, -3, -4, -5, -6, -7, -8};
std::cout<<min128_sse(givenVector.data(), givenVector.size())<<std::endl;
std::cout<<min256_sse(givenVector.data(), givenVector.size())<<std::endl;
}
The code runs into segmentation fault in the following line -
minVal = _mm256_min_ps(minVal, simdVector[i])
From my basic understanding of SIMD instructions, _mm256_min_ps
would operate on 256 bits at once as opposed to 128 bits in _mm_min_ps
. If I am not encountering a segmentation fault in in my 128 bit version, I should not be facing it in the 256 bit version as well. The only changes that would be required would be the range of the for loop.
However the segmentation fault comes into picture even at i=0
. I suppose there is a gap in my understanding of SIMD instructions. Can someone please highlight it.
Can someone also point out why the Segmentation fault is being thrown.
TIA