0

In my previous post I explain that I am starting with AVX to speed up my code (please, note that although there are parts in common this post refers to AVX512 and the previous one to AVX2 which as far as I know are slightly different and need different compiling flags). After experimenting with AVX2 I decided to try with AVX512 and changed my AVX2 function:

void getDataAVX2(u_char* data, size_t cols, std::vector<double>& info)
{
  __m256d dividend = _mm256_set_pd(1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0);
  info.resize(cols);
  __m256d result;
  for (size_t i = 0; i < cols / 4; i++)
  {
    __m256d divisor = _mm256_set_pd((double(data[4 * i + 3 + cols] << 8) + double(data[4 * i + 2 * cols + 3])),
                                    (double(data[4 * i + 2 + cols] << 8) + double(data[4 * i + 2 * cols + 2])),
                                    (double(data[4 * i + 1 + cols] << 8) + double(data[4 * i + 2 * cols + 1])),
                                    (double(data[4 * i + cols] << 8) + double(data[4 * i + 2 * cols])));
    result = _mm256_sqrt_pd(_mm256_mul_pd(divisor, dividend));
    info[size_t(4 * i)] = result[0];
    info[size_t(4 * i + 1)] = result[1];
    info[size_t(4 * i + 2)] = result[2];
    info[size_t(4 * i + 3)] = result[3];
  }
}

for what I think should be its equivalent:

void getDataAVX512(u_char* data, size_t cols, std::vector<double>& info)
{
  __m512d dividend = _mm512_set_pd(1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0, 1 / 64.0);
  info.resize(cols);
  __m512d result;
  for (size_t i = 0; i < cols / 8; i++)
  {
    __m512d divisor = _mm512_set_pd((double(data[4 * i + 7 + cols] << 8) + double(data[4 * i + 2 * cols + 7])),
                                    (double(data[4 * i + 6 + cols] << 8) + double(data[4 * i + 2 * cols + 6])),
                                    (double(data[4 * i + 5 + cols] << 8) + double(data[4 * i + 2 * cols + 5])),
                                    (double(data[4 * i + 4 + cols] << 8) + double(data[4 * i + 2 * cols + 4])),
                                    (double(data[4 * i + 3 + cols] << 8) + double(data[4 * i + 2 * cols + 3])),
                                    (double(data[4 * i + 2 + cols] << 8) + double(data[4 * i + 2 * cols + 2])),
                                    (double(data[4 * i + 1 + cols] << 8) + double(data[4 * i + 2 * cols + 1])),
                                    (double(data[4 * i + cols] << 8) + double(data[4 * i + 2 * cols])));
    result = _mm512_sqrt_pd(_mm512_mul_pd(divisor, dividend));
    info[size_t(4 * i)] = result[0];
    info[size_t(4 * i + 1)] = result[1];
    info[size_t(4 * i + 2)] = result[2];
    info[size_t(4 * i + 3)] = result[3];
    info[size_t(4 * i + 4)] = result[4];
    info[size_t(4 * i + 5)] = result[5];
    info[size_t(4 * i + 6)] = result[6];
    info[size_t(4 * i + 7)] = result[7];
  }
}

which in a non AVX form is:

void getData(u_char* data, size_t cols, std::vector<double>& info)
{
  info.resize(cols);
  for (size_t i = 0; i < cols; i++)
  {
    info[i] = sqrt((double(data[cols + i] << 8) + double(data[2 * cols + i])) / 64.0);
    ;
  }
}

After compiling the code I get the following error:

Illegal instruction (core dumped)

To my surprise, this error occurs in the call of sqrt in the getData function. If I remove the sqrt call then the error appears further forward, in the __m512d divisor = _mm512_set_pd((d..... Any ideas on what is happening?

Here is the full example.

Thank you very much.

I am compiling with c++ (7.3.0) with the following options -std=c++17 -Wall -Wextra -O3 -fno-tree-vectorize -mavx512f. I have checked as explained here and my CPU (Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz) supports AVX2. Should the list have AVX-512 to indicate support for this?

apalomer
  • 1,895
  • 14
  • 36
  • If you'd compiled with `-march=native`, your compiler would tell you that you can't use AVX512 intrinsics without AVX512 enabled. (Because it would detect that your CPU doesn't have AVX512.) Clearly you're not compiling with `g++ -O3 -mavx2`, because only MSVC allows using intrinsics without enabling the relevant instruction set. (For use with runtime dispatching). I assume this was an editing mistake because you mentioned using `-mavx512f` in your previous question. – Peter Cordes Aug 14 '18 at 07:04

1 Answers1

3

I don't think AVX-512 instructions are supported on your system (CPU). Taking the official documentation into consideration; it only mentions AVX-2. A newer CPU would indicate AVX-512 perfectly fine. Both can be found under the "Instruction Set Extensions" section.

Caramiriel
  • 7,029
  • 3
  • 30
  • 50