0

I have a code like the following and I wish to vectorize it. I analyzed the whole code using an intel advisor. It says I cant vectorize this because it has math functions in it. It of course points out towards the sin and cos function used inside the loop.

How to vectorize this loop without using intel short vector math library?

Code:

for (size_t j = 0; j < NA; ++j) {
    esf = sfs[j];
    x = p_data[3 * j];
    y = p_data[3 * j + 1];
    z = p_data[3 * j + 2];

    p = x * qx + y * qy + z * qz;

    Ar += esf * cos(p);
    Ai += esf * sin(p);
}
Mansoor
  • 2,357
  • 1
  • 17
  • 27
  • Do you need the full precision in your trig computations? – Unlikus Sep 17 '20 at 07:58
  • do you get any message from intel advisor? Please include it in the question. Why do you think `sin` and `cos` prevent vectorization? – 463035818_is_not_an_ai Sep 17 '20 at 08:37
  • `p_data` is also not arranged ideally. Calculating each `p` requires transposing `3 x N` blocks or horizontal addition. – chtz Sep 17 '20 at 10:41
  • 1
    Intel advisor says that the loop is not vectorized. The Intel advisor recommendation tab says: Scalar math function call(s) present. Math functions in the loop body are preventing the compiler from effectively vectorizing the loop. Improve performance by enabling vectorized math call(s). Their solution is : 1) Use the Intel short vector math library for vector intrinsics. 2) Use a Glibc library with vectorized SVML functions. I am interested to know if there is any other possibility. – Arnab Majumdar Sep 17 '20 at 20:31

1 Answers1

1

It says I cant vectorize this because it has math functions in it.

Actually it's the Ar += and Ai += terms that are preventing vectorisation, because it means that the output at j = 2 would depend on output for j = 1. If Ar and Ai are just outputs, you could make them arrays instead, and sum over them after running the loop.

// init to 0
double Ar_elem[NA] = {0.0};
double Ai_elem[NA] = {0.0};

for (size_t j = 0; j < NA; ++j) {
    esf = sfs[j];
    x = p_data[3 * j];
    y = p_data[3 * j + 1];
    z = p_data[3 * j + 2];

    p = x * qx + y * qy + z * qz;

    Ar_elem[j] = esf * cos(p);
    Ai_elem[j] = esf * sin(p);
}

//sum
double Ar = std::accumulate(begin(Ar_elem), end(Ar_elem), 0, plus<double>());
double Ai = std::accumulate(begin(Ai_elem), end(Ai_elem), 0, plus<double>());
Roy2511
  • 938
  • 1
  • 5
  • 22
  • Thank you for your solution. I know, there is a dependency there. I was thinking of #pragma omp simd reduction. However, the math functions are also preventing vectorization. Do you have any solution for that? – Arnab Majumdar Sep 17 '20 at 08:23
  • You could look at [this answer](https://stackoverflow.com/a/16110087/3089908). But I am more interested in why you think sin and cos would prevent simd reduction. How do you know that this code isn't being vectorized? – Roy2511 Sep 17 '20 at 10:23
  • Intel advisor says that the loop is not vectorized. The Intel advisor recommendation tab says: Scalar math function call(s) present. Math functions in the loop body are preventing the compiler from effectively vectorizing the loop. Improve performance by enabling vectorized math call(s). Their solution is : 1) Use the Intel short vector math library for vector intrinsics. 2) Use a Glibc library with vectorized SVML functions. I am interested to know if there is any other possibility. – Arnab Majumdar Sep 17 '20 at 20:36
  • You can "vectorise" the loop by using `#pragma omp parallel for`, but I do not think that's SIMD vectorisation, just threading. But your solution will be vectorised. BTW, have you enabled OMP? I think you need to explicitly specify a compiler switch (`/openmp` in VS). – Roy2511 Sep 18 '20 at 02:17
  • I think the link you provided in the second comment helps and answers my question. I have not yet switched the -fopenmp option on. That could be the reason intel advisor shows the recommendation. In any case, thank you for your solution. – Arnab Majumdar Sep 18 '20 at 10:20