I have a big function, which for the MWE below is called modify_vec
. In it, there are for loops where an operation is performed many times (e.g., for each element of a matrix).
I would like the function to be capable of applying different operations, depending on different given inputs, without using if-else statements within the loop.
From my (limited) understanding of compiler optimizations using SIMD, I would have to: either manually write all possible combinations of the operations for the function, so that the compiler knows at compile-time which operation to optimize for; or using templates somehow.
But I do not know how to perform this using templates. Instead, I've tried with std::map<const int, const std::function>
.
Below are some code snippets and a MWE, which can also be seen in Compiler Explorer: https://godbolt.org/z/z777aq5ab.
Bonus question: since none of my actual vectors are defined at compile-time, does "constexpr" even help?
Possible solution based on jwezorek's answer below
A version of modify_vec
that only works for a specific order of operations, where p
is an argument common to all possible operations.
void modify_vec(std::vector<double> &inp_vec,
const std::vector<double> &p) {
const auto n = inp_vec.size();
for (auto i=0; i<n; ++i) {
inp_vec[i] = inp_vec[i]*p[0]+3.; // this is operation 1 in the MWE
}
for (auto i=0; i<n; ++i) {
inp_vec[i] = inp_vec[i]/p[1]-4.; // this is operation 2 in the MWE
}
}
My attempt at generalizing it, with only two distinct operations, although more may be used in my actual problem:
// Are these functions inlined with GCC+11?
constexpr double first_op(const double input, const double p) {
return input*p+3.;
}
constexpr double second_op(const double input, const double p) {
return input/p-4.;
}
std::map<const int,
const std::function<double(const double, const double)>> op {
{0, first_op},
{1, second_op}
};
void modify_vec_gen(std::vector<double> &inp_vec,
const std::vector<double> &p,
const int op1, const int op2) {
const auto n = inp_vec.size();
for (auto i=0; i<n; ++i) {
inp_vec[i] = op[op1](inp_vec[i], p[0]); // Can SIMD be used here?
}
for (auto i=0; i<n; ++i) {
inp_vec[i] = op[op2](inp_vec[i], p[1]);
}
}
My main file, main.cpp
, contains the above code snippets and the following, compiled with g++ -std=c++11 -O3 -ffast-math -mavx2 main.cpp -o main
.
#include <iostream>
#include <vector>
#include <functional>
#include <map>
/* Copy and paste the above snippets here */
template <typename T>
constexpr void print_vec(const std::vector<T> &vec) {
for (const auto v : vec) {
std::cout << v << " ";
}
std::cout<<std::endl;
}
int main(int argc, char** argv) {
int op1 = std::stoi(argv[1]);
int op2 = std::stoi(argv[2]);
auto vec1 = std::vector<double>{2., 5., -3., 10.};
const auto p = std::vector<double>{3., 2.};
auto vec2{vec1};
auto vec3{vec1};
std::cout << "Initial vector:" << std::endl;
print_vec(vec1);
std::cout << "After operations 0 and 1:" << std::endl;
modify_vec(vec1, p);
print_vec(vec1);
std::cout << "After operations " << op1 << " and " << op2 << ":" << std::endl;
modify_vec_gen(vec2, p, op1, op2);
print_vec(vec2);
std::cout << "After operations " << op1 << " and " << op1 << ":" << std::endl;
modify_vec_gen(vec3, p, op1, op1);
print_vec(vec3);
return 0;
}
Running with ./main 0 1
yields:
Initial vector:
2 5 -3 10
After operations 0 and 1:
0.5 5 -7 12.5
After operations 0 and 1:
0.5 5 -7 12.5
After operations 0 and 0:
21 39 -9 69
Possible olution
A possible, based on jwezorek's answer is shown below. In Compiler Explorer: https://godbolt.org/z/K3KefzTYW.
Compiled using -O3 -ffast-math -mavx2 -std=c++11 -fopt-info-vec-all -Wall -pedantic -Werror
in gcc, -O3 -ffast-math -mavx2 -std=c++11 -Rpass-analysis=loop-vectorize -Wall -pedantic -Werror
in clang, and /O2 /FA /arch:AVX2 /std:c11 /Qvec-report:2
in MSVC.
But MSVC is not able to vectorize the code, only GCC is.
The helper function call_modify_vec
is pretty ugly, but, since I will only use two distinct operations, the first having 2 possibilities and the other 4, I guess it's good enough.
#include <iostream>
#include <vector>
#ifdef _WIN32
#define MYRESTRICT __restrict
#else
#define MYRESTRICT __restrict__
#endif
constexpr double first_op(const double input, const double p) {
return input*p+3.;
}
constexpr double second_op(const double input, const double p) {
return input/p-4.;
}
using op_fn = double(*)(const double input, const double p);
const op_fn op_vec[] = {first_op, second_op};
template<int OP1, int OP2>
void modify_vec_template(double * MYRESTRICT inp_vec,
const std::size_t vec_size,
const std::vector<double> &p) {
std::size_t i;
for (i=0; i<vec_size; ++i) {
inp_vec[i] = op_vec[OP1](inp_vec[i], p[0]);
}
for (i=0; i<vec_size; ++i) {
inp_vec[i] = op_vec[OP2](inp_vec[i], p[1]);
}
}
void call_modify_vec(std::vector<double> &in_vec,
const std::vector<double> &p,
const int op1, const int op2) {
if ( (op1 == 0) && (op2 == 0)) {
modify_vec_template<0, 0>(in_vec.data(), in_vec.size(), p);
} else if ( (op1 == 0) && (op2 == 1)) {
modify_vec_template<0, 1>(in_vec.data(), in_vec.size(), p);
} else if ( (op1 == 1) && (op2 == 0)) {
modify_vec_template<1, 0>(in_vec.data(), in_vec.size(), p);
} else if ( (op1 == 1) && (op2 == 1)) {
modify_vec_template<1, 1>(in_vec.data(), in_vec.size(), p);
}
}