0

I have a function that goes as follows -

void dxy_SIMD(uchar* img_ptr, uchar* dxy_ptr, size_t M, size_t N){
    
    for (size_t i = 1; i < M - 1; i++)
    {
        for (size_t j = 1; j < N; j += 32)
        {
            auto idx_before = (i - 1)*N + j;
            auto idx = i * N + j;
            auto idx_after = (i + 1) * N + j;

            _mm_prefetch(img_ptr + idx - 1, _MM_HINT_T0);
            _mm_prefetch(img_ptr + idx_before, _MM_HINT_T0);
            _mm_prefetch(img_ptr + idx_after, _MM_HINT_T0);

            auto after = _mm256_loadu_epi8(img_ptr + idx_after);
            auto current = _mm256_loadu_epi8(img_ptr + idx);
            auto before = _mm256_loadu_epi8(img_ptr + idx_before);

            auto dx = _mm256_add_epi8(_mm256_loadu_epi8(img_ptr + idx + 1), _mm256_loadu_epi8(img_ptr +idx - 1));
            auto dy = _mm256_add_epi8(after, before);

            auto negative_middle = _mm256_sub_epi8(current, _mm256_add_epi8(current, current)); // convert x to -x by doing x - 2x
            auto negative_middle_times_three = _mm256_add_epi8(negative_middle, _mm256_add_epi8(negative_middle, negative_middle)); // Get -3x here
            auto negative_middle_times_four = _mm256_add_epi8(negative_middle, negative_middle_times_three); // Getting -4x here. -> Illegal Instruction error thrown

            _mm256_storeu_epi8(dxy_ptr + idx, _mm256_add_epi8(dx, _mm256_add_epi8(negative_middle_times_four, dy)));
        }
        
    }
}

I however run into Illegal instruction error(Illegal instruction (core dumped)) when the function runs, and the error is thrown at line (as shown by gdb) -

 auto negative_middle_times_four = _mm256_add_epi8(negative_middle, negative_middle_times_three);

I am currently using the Intel's dpcpp compiler and add the following compile options - -g -O3 -fsycl -fsycl-targets=spir64 -mavx512vl -mavx512bw -mavx512f

My CPU is i7 9750h, which I know only supports till avx2, but I added those mavx512 flags becuase the compiler specifically asked for it by throwing an error when I ran make for some other functions in my code to compile. However, those functions do almost the same operation as dxy_SIMD and run perfectly.

I am unable to understand the source of the error as to why it is throwing an Illegal Instruction Error even though I have been using that intrinsic in the same code, in the same function as well.

TIA

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Atharva Dubey
  • 832
  • 1
  • 8
  • 25
  • You used compiler options other than `-march=native` to compile intrinsics for instructions your CPU doesn't support into asm instructions your CPU doesn't support. Seems obvious why you'd get a SIGILL when running the resulting binary on that CPU. – Peter Cordes Dec 05 '21 at 07:57
  • Although actually the compiler could have only used AVX2 instructions; `_mm256_loadu_epi8` is the same things as `_mm256_loadu_si256`, but was only introduced with AVX512. – Peter Cordes Dec 05 '21 at 07:59
  • Thank you for replying @PeterCordes . I did think about assembly being generated which the CPU would not be able to consume, but I let it slide, well because the compiler knows the best ideology. But then even in my previous functions, where I did not add `-march=native` flag and I made a lot of usage of `_mm256_loadu_epi8`, `add_epi8` etc etc, why did that run fine then? Also should'nt the SIGILL error be thrown at the start of the function itself rather than at towards the end ? – Atharva Dubey Dec 05 '21 at 09:06
  • No, it gets thrown on the first AVX-512 instruction the compiler happened to use. See the linked duplicates (above the question). It's never safe to tell the compiler it can use instructions not supported by the CPU you want to run on. If it complains about target options for an intrinsic, you can't use that intrinsic with those options. – Peter Cordes Dec 05 '21 at 10:26

0 Answers0