2

Given this piece of code:

struct T
{
  void f(int const);
};

void f(std::vector<T> &u, std::vector<int> const &v)
{
  for (std::size_t i = 0; i < u.size(); ++i)
    u[i].f(v[i]);
}

Is there a standard way to parallelize the body of void f(std::vector<T> &u, std::vector<int> const &v)?

This happens to work by chance (https://godbolt.org/z/gRv9Ze):

void f(std::vector<T> &u, std::vector<int> const &v)
{
  auto const indices = std::views::iota(0u, u.size()) | std::views::common;

  std::for_each(std::execution::par_unseq, std::begin(indices), std::end(indices),
                [&](std::size_t const i) { u[i].f(v[i]); });
}

but it is reportedly wrong to rely on such behavior (see this bug report and this answer). Indeed, this doesn't run in parallel (https://godbolt.org/z/MPGdHF):

void f(std::vector<T> &u, std::vector<int> const &v)
{
  std::ranges::iota_view<std::size_t, std::size_t> const indices(0u, u.size());

  std::for_each(std::execution::par_unseq, std::begin(indices), std::end(indices),
                [&](std::size_t const i) { u[i].f(v[i]); });
}

I'm pretty sure there should be a standard way to make a function like that run in parallel. I'm probably missing an obvious algorithm, but std::transform does not seem to be appropriate here, and the others even less so.

Pilar Latiesa
  • 675
  • 4
  • 10
  • 1st example "This happens to work by chance" --> as per [cppreference](https://en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t): "If the implementation cannot parallelize or vectorize (e.g. due to lack of resources), all standard execution policies can fall back to sequential execution." 2nd example "Indeed, this doesn't run parallel" --> as in the bug report you linked, changing the iota_view template parameters to a signed integer type to meet the requirements works just fine. – gkhaos Jun 09 '20 at 08:18
  • 2
    @gkhaos I know that I can make it work in GCC. The problem is that `iota_view::iterator` does not model `Cpp17ForwardIterator` and therefore it may perfectly happen that implementations silently fall back to the sequential version. In fact, for example, the MSVC team claim that no attempt is made in their implementation to support such iterators: https://github.com/oneapi-src/oneDPL/issues/22#issuecomment-442180158 – Pilar Latiesa Jun 09 '20 at 08:36

1 Answers1

2

Staying within std, your best bet is std::transform with an output iterator that ignores what is given to it

struct unit_iterator {
    using difference_type = std::ptrdiff_t;
    using value_type = std::tuple<>;
    using pointer = std::tuple<> *;
    using const_pointer = const std::tuple<> *;
    using reference = std::tuple<> &;
    using const reference = const std::tuple<> &;
    using iterator_category = std::random_access_iterator_tag;

    reference operator*() { return value; }
    const_reference operator*() const { return value; }
    reference operator[](difference_type) { return value; }
    const_reference operator[](difference_type) const { return value; }
    pointer operator->() { return &value; }
    const_pointer operator->() const { return &value; }

    unit_iterator& operator++() { return *this; }
    unit_iterator operator++(int) { return *this; }
    unit_iterator& operator+=(difference_type) { return *this; }
    unit_iterator operator+(difference_type) const { return *this; }

    unit_iterator& operator--() { return *this; }
    unit_iterator operator--(int) { return *this; }
    unit_iterator& operator-=(difference_type) { return *this; }
    unit_iterator operator-(difference_type) const { return *this; }

    difference_type operator-(unit_iterator) const { return 0; }

    bool operator==(unit_iterator) const { return true; }
    bool operator!=(unit_iterator) const { return false; }
    bool operator<(unit_iterator) const { return false; }
    bool operator<=(unit_iterator) const { return true; }
    bool operator>(unit_iterator) const { return false; }
    bool operator>=(unit_iterator) const { return true; }
private:
    static value_type value;
};


void f(std::vector<T> &u, std::vector<int> const &v)
{
  std::transform(std::execution::par_unseq, begin(u), end(u), begin(v), unit_iterator{},
                 [](T & u, int v) { u.f(v); return std::tuple<>{}; });
}
Caleth
  • 52,200
  • 2
  • 44
  • 75
  • Thanks so much for the answer. Here's a version that demonstrate that it works (after fixing some typos): https://godbolt.org/z/C9mqSM. However, I fail to see the improvement over `iota_view::iterator`. Isn't `unit_iterator` as stashing as `iota_view::iterator` is? `vector::iterator` claims to be random access, yet it fails misserably in many standard algorithms. Isn't `unit_iterator` affected by the concerns raised by Billy O'Neal in this comment: https://github.com/oneapi-src/oneDPL/issues/22#issuecomment-442180158 – Pilar Latiesa Jun 09 '20 at 10:20
  • @PilarLatiesa I think that's fixable. You only ever need one `value` for all unit_iterators – Caleth Jun 09 '20 at 10:29
  • Good point! But I think it should be declared `thread_local` to prevent race conditions, shouldn't it? – Pilar Latiesa Jun 09 '20 at 10:32
  • yeah, I'm not sure if assigning an empty tuple "writes to a memory location". An empty struct that explicitly ignores it's assignment parameter would be for sure safe. – Caleth Jun 09 '20 at 10:37
  • I have no idea either... Anyway I assume that it doesn't harm to annotate it `thread_local`. Overall, I really like the idea. As a side note: `operator-(difference_type)` and `operator+(difference_type)` must be `const`, and the lamda must not take an `int &`, but a `int const &`, `auto &` or simply `int` – Pilar Latiesa Jun 09 '20 at 10:43