I have a function that goes as follows -
void dxy_SIMD(uchar* img_ptr, uchar* dxy_ptr, size_t M, size_t N){
for (size_t i = 1; i < M - 1; i++)
{
for (size_t j = 1; j < N; j += 32)
{
auto idx_before = (i - 1)*N + j;
auto idx = i * N + j;
auto idx_after = (i + 1) * N + j;
_mm_prefetch(img_ptr + idx - 1, _MM_HINT_T0);
_mm_prefetch(img_ptr + idx_before, _MM_HINT_T0);
_mm_prefetch(img_ptr + idx_after, _MM_HINT_T0);
auto after = _mm256_loadu_epi8(img_ptr + idx_after);
auto current = _mm256_loadu_epi8(img_ptr + idx);
auto before = _mm256_loadu_epi8(img_ptr + idx_before);
auto dx = _mm256_add_epi8(_mm256_loadu_epi8(img_ptr + idx + 1), _mm256_loadu_epi8(img_ptr +idx - 1));
auto dy = _mm256_add_epi8(after, before);
auto negative_middle = _mm256_sub_epi8(current, _mm256_add_epi8(current, current)); // convert x to -x by doing x - 2x
auto negative_middle_times_three = _mm256_add_epi8(negative_middle, _mm256_add_epi8(negative_middle, negative_middle)); // Get -3x here
auto negative_middle_times_four = _mm256_add_epi8(negative_middle, negative_middle_times_three); // Getting -4x here. -> Illegal Instruction error thrown
_mm256_storeu_epi8(dxy_ptr + idx, _mm256_add_epi8(dx, _mm256_add_epi8(negative_middle_times_four, dy)));
}
}
}
I however run into Illegal instruction error(Illegal instruction (core dumped)
) when the function runs, and the error is thrown at line (as shown by gdb) -
auto negative_middle_times_four = _mm256_add_epi8(negative_middle, negative_middle_times_three);
I am currently using the Intel's dpcpp
compiler and add the following compile options -
-g -O3 -fsycl -fsycl-targets=spir64 -mavx512vl -mavx512bw -mavx512f
My CPU is i7 9750h, which I know only supports till avx2
, but I added those mavx512
flags becuase the compiler specifically asked for it by throwing an error when I ran make
for some other functions in my code to compile. However, those functions do almost the same operation as dxy_SIMD
and run perfectly.
I am unable to understand the source of the error as to why it is throwing an Illegal Instruction Error
even though I have been using that intrinsic in the same code, in the same function as well.
TIA