Forwarding in multi-threaded code

Question

I am working on abstractions for a family of optimization algorithms. These algorithms can run serially or multi-threaded, either using locking mechanisms or atomic operations.

I have a question regarding perfect-forwarding when it comes to the multi-threaded version of the algorithms. Say, for instance, I have some functor which I am not willing to copy because it is expensive. I can make sure the functors are static, in that, the calls to their operator()(...) will not change the state of the object. One such dummy functor is below:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <thread>
#include <vector>

template <class value_t> struct WeightedNorm {
  WeightedNorm() = default;
  WeightedNorm(std::vector<value_t> w) : w{std::move(w)} {}

  template <class Container> value_t operator()(Container &&c) const & {
    std::cout << "lvalue version with w: " << w[0] << ',' << w[1] << '\n';
    value_t result{0};
    std::size_t idx{0};
    auto begin = std::begin(c);
    auto end = std::end(c);
    while (begin != end) {
      result += w[idx++] * *begin * *begin;
      *begin++ /* += 1 */; // <-- we can also modify
    }
    return result; /* well, return std::sqrt(result), to be precise */
  }

  template <class Container> value_t operator()(Container &&c) const && {
    std::cout << "rvalue version with w: " << w[0] << ',' << w[1] << '\n';
    value_t result{0};
    std::size_t idx{0};
    auto begin = std::begin(c);
    auto end = std::end(c);
    while (begin != end) {
      result += w[idx++] * *begin * *begin;
      *begin++ /* += 1 */; // <-- we can also modify
    }
    return result; /* well, return std::sqrt(result), to be precise */
  }

private:
  std::vector<value_t> w;
};

This functor might also have the reference qualifiers for some of its member functions, as seen above (although, above, they are not different from each other). Moreover, the function objects are allowed to modify their input c. To perfect-forward this functor properly to the worker threads in the algorithm, I have thought of the following:

template <class value_t> struct algorithm {
  algorithm() = default;
  algorithm(const unsigned int nthreads) : nthreads{nthreads} {}

  template <class InputIt> void initialize(InputIt begin, InputIt end) {
    x = std::vector<value_t>(begin, end);
  }

  template <class Func> void solve_ref_1(Func &&f) {
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(&algorithm::kernel<decltype((f)), decltype(x)>, this,
                           std::ref(f), x);
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_ref_2(Func &&f) {
    auto &xlocal = x;
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread([&, xlocal]() mutable { kernel(f, xlocal); });
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_forward_1(Func &&f) {
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(
          &algorithm::kernel<decltype(std::forward<Func>(f)), decltype(x)>,
          this, std::ref(f), x); /* this is compilation error */
    for (auto &worker : workers)
      worker.join();
  }

  template <class Func> void solve_forward_2(Func &&f) {
    auto &xlocal = x;
    std::vector<std::thread> workers(nthreads);
    for (auto &worker : workers)
      worker = std::thread(
          [&, xlocal]() mutable { kernel(std::forward<Func>(f), xlocal); });
    for (auto &worker : workers)
      worker.join();
  }

private:
  template <class Func, class Container> void kernel(Func &&f, Container &&c) {
    std::forward<Func>(f)(std::forward<Container>(c));
  }

  std::vector<value_t> x;
  unsigned int nthreads{std::thread::hardware_concurrency()};
};

Basically, what I had in mind when writing the above was that algorithm::solve_ref_1 and algorithm::solve_ref_2 differ from each other only in the use of the lambda function. In the end, both of them call kernel with an lvalue reference to f and an lvalue reference to x, where x is copied in each of the threads either due to how std::thread works or the capture of xlocal by copy in the lambda. Is this correct? Should I be careful in prefering one to the other?

So far, I was not able to do what I wanted to achieve. I have not made an unnecessary copy of f, but I have not respected its reference qualifier, either. Then, I thought of forwarding f to kernel. Above, I couldn't find a way of making algorithm::solve_forward_1 compile due to the deleted constructor of std::ref for rvalue references. However, algorithm::solve_forward_2, which uses the lambda function approach, seems to be working. By "seems to be working," I mean that the following main program

int main(int argc, char *argv[]) {
  std::vector<double> x{1, 2};
  algorithm<double> alg(2);
  alg.initialize(std::begin(x), std::end(x));

  alg.solve_ref_1(WeightedNorm<double>{{1, 2}});
  alg.solve_ref_2(WeightedNorm<double>{{1, 2}});
  // alg.solve_forward_1(WeightedNorm<double>{{1, 2}});
  alg.solve_forward_2(WeightedNorm<double>{{1, 2}});

  return 0;
}

compiles and prints the following:

./main.out
lvalue version with w: 1,2
lvalue version with w: 1,2
lvalue version with w: 1,2
lvalue version with w: 1,2
rvalue version with w: 1,2
rvalue version with w: 1,2

In short, I have two major questions:

Is there any reason why I should prefer lambda function version to the other (or, vice versa), and,
Is perfect-forwarding the functor f more than once in my situation allowed/OK?

I am asking 2. above, because in the answer to a different question, the author says:

You cannot forward something more than once, though, because that makes no sense. Forwarding means that you're potentially moving the argument all the way through to the final caller, and once it's moved it's gone, so you cannot then use it again.

I assume that, in my case, I am not moving anything, but rather trying to respect the reference qualifier. In the output of my main program, I can see that w has the proper values in the rvalue version, i.e., 1,2, but that does not mean that I am doing some undefined behavior such as trying to access an already moved vector's values.

I would appreciate if you helped me understand this better. I am also open to any other feedback about the way I am trying to solve my problem.

StenSoft · Accepted Answer · 2018-03-12T16:34:55.903

1

There is no reason to prefer either
Forwarding inside a for cycle is not ok. You can't forward the same variable twice:

template <typename T> void func(T && param) { func1(std::forward<T>(param)); func2(std::forward<T>(param)); // UB }

Chain forwarding (std::forward(std::forward(…))) on the other hand is fine.

edited Mar 12 '18 at 16:34

answered Mar 12 '18 at 16:12

StenSoft

9,369
25
30

I am not chain forwarding when I create `std::thread` objects, am I? I forward the same functor to the threads, which makes it get forwarded more than once. Yet, I was thinking that `kernel` accepts rvalue reference thanks to the forwarding `Func &&`, and that's why I was "safe" in this case. – Arda Aytekin Mar 12 '18 at 16:18
Oh, I didn't notice it in the `for` cycle. That is not ok. – StenSoft Mar 12 '18 at 16:21
There is a slight typo in your example --- it should read `T&& param`. But then, you say that I am observing just an UB when I run the code, or? I am just lucky to see `w: 1,2` but not some memory violation error, is that true? My best bet is to keep the forwarded `f` inside `solve(...){...}` and pass its lvalue reference to the kernel? – Arda Aytekin Mar 12 '18 at 16:26
Thanks, I will fix it. The parameters and function pointer are primitive types that are copied when forwarded. If you tried it with e.g. `std::string`, you should see it not working. – StenSoft Mar 12 '18 at 16:32
I am sorry, but I cannot understand it. Sorry for keeping asking more questions as you comment. I already tried with `std::vector` inside `WeightedNorm`. Either `std::vector w` is _not_ moved, and my use of `std::forward` is correct, or I am just lucky not to hit the point where the memory location is given to some other process. Where should I try `std::string` to observe the behavior you are mentioning? – Arda Aytekin Mar 12 '18 at 16:36
I was more like thinking of using `Func&& f` in the "transforming wrapper" sense as given in [cppreference](http://en.cppreference.com/w/cpp/utility/forward). So is there not any possibility for me to transform the call appropriately in the spawned threads? – Arda Aytekin Mar 12 '18 at 16:49
Oh, I see why it works. You don't move the rvalue reference in `operator ()` that uses it with appropriate move constructor so it behaves like an lvalue reference. If you don't want to actually move it, why not use lvalue references so it's obvious you don't want to move it, just share it? – StenSoft Mar 13 '18 at 04:53
Uhm, it is a library code. I don't know if the user would pass an rvalue _and_ if that rvalue object has two overloads with different reference qualifiers. In the `solve_ref_{1,2}` versions, I do use the lvalue inside the function call. However, the calls, of course, do not resolve to the one with rvalue ref qualifier. That's why I thought it should be fine to capture `f` by reference in the lambda, convert it to its correct value category using `std::forward` and pass it to the kernel which is defined with forwarding reference template parameter. How would you do this otherwise? – Arda Aytekin Mar 13 '18 at 06:36
I will not use `std::forward` in the `for` loop. If the lambda functor internally uses lvalue references, the copied functor will copy lvalue references (basically pointers) which is cheap. Rvalue references shouldn't work (multiple forwards) and values will be fully copied. I don't see any other way that will be safe, and the user of the library still can make it UB if he's not careful (e.g. by modifying the value passed by lvalue reference from multiple threads without synchronization). – StenSoft Mar 13 '18 at 08:54
Thank you for the discussion. I will not keep this any more extended. But still, I am having trouble understanding the behavior. The lambda functors _are_ already using lvalue references to `f` in the code above (`solve_forward_2`). I just use `std::forward` to cast the lvalue reference inside the lambda functor to its actual reference category. The user ideally does _not_ have access to `kernel` above. But as I said, I will not continue the discussion. Most likely, I am missing something then, as you are still insisting that it is dangerous. Thank you for your time :) – Arda Aytekin Mar 13 '18 at 09:15

Forwarding in multi-threaded code

1 Answers1