Given this piece of code:
struct T
{
void f(int const);
};
void f(std::vector<T> &u, std::vector<int> const &v)
{
for (std::size_t i = 0; i < u.size(); ++i)
u[i].f(v[i]);
}
Is there a standard way to parallelize the body of void f(std::vector<T> &u, std::vector<int> const &v)
?
This happens to work by chance (https://godbolt.org/z/gRv9Ze):
void f(std::vector<T> &u, std::vector<int> const &v)
{
auto const indices = std::views::iota(0u, u.size()) | std::views::common;
std::for_each(std::execution::par_unseq, std::begin(indices), std::end(indices),
[&](std::size_t const i) { u[i].f(v[i]); });
}
but it is reportedly wrong to rely on such behavior (see this bug report and this answer). Indeed, this doesn't run in parallel (https://godbolt.org/z/MPGdHF):
void f(std::vector<T> &u, std::vector<int> const &v)
{
std::ranges::iota_view<std::size_t, std::size_t> const indices(0u, u.size());
std::for_each(std::execution::par_unseq, std::begin(indices), std::end(indices),
[&](std::size_t const i) { u[i].f(v[i]); });
}
I'm pretty sure there should be a standard way to make a function like that run in parallel. I'm probably missing an obvious
algorithm, but std::transform
does not seem to be appropriate here, and the others even less so.