0

I am using std::async with std::launch::async to initialize some threads and parallelize calling a rendering function that takes in two integers (i.e. x and y of a pixel in an image) and does lots of other computation and eventually outputs some values (i.e. pixel values). Just for more context: the function I'm talking about has lots of variable declarations and other function calls.

I noticed that when I use std::async and call that function and is called lots of times, I don't get very accurate results anymore. In addition, my program runs faster without using std::async. I'm a novice to threading in C++ but it looks like, despite the fact that the threads are supposed to behave asynchronously and independently, they might sometimes try to access each others' resources (e.g. memory and CPU cores). Therefore, they might sometimes overwrite some stuff in memory, leading to inaccurate results.

I wonder, how can I guarantee that I always get accurate results when calling a function via std::async? I was searching for an answer and came across this and this posts and learned that I should probably be using something called atomic variables in order to make sure my threads do not use each others' resources and also run in parallel (instead of running sequentially). However, I could not find a good, clear example that shows how people use atomic variables to achieve this. Should the atomic variable be used inside my function or when I am creating the threads (e.g. the parameters I pass to the function)? So I would appreciate it if someone can provide an example of this.

Below you can see a simplified example of my code but which is very similar to this answer in terms of calling a member function of a class instance.

#include <future>
#include <iostream>

class Render {

  public:
    std::vector<float> render (int x,int y);
    
    std::vector<float> m_pixelValue;
    int m_width=800;
    int m_height=600;
};

std::vector<float> Render::render (int x, int y) {
  std::vector<float> currentPixelData;
  // do lots of work here and update currentPixelData
  
  m_pixelValue = currentPixelData;
  return m_pixelValue;
}


int main()
{
  std::vector<float> pixelResult;
  std::vector<std::future<void>> threads;
  threads.reserve(5);
  std::unique_ptr<Render> renderInstance(new Render());
  for (int x=0; x<width;x++){
      for (int y=0;y<height;y++){
          threads.push_back(std::async(std::launch::async, &Render::render, &renderInstance, x, y));

          if(threads.size == 5){
              for (auto &th: threads){
                  pixelResult = th.get();
                  canvasData.update(pixelResult);
              }
              threads.clear();
          }
       }
    }
}

Here's what I get when I run the function sequentially, without async: enter image description here

And here's what I get when I use async: enter image description here

Amir
  • 10,600
  • 9
  • 48
  • 75
  • [What is the difference between concurrent programming and parallel programming?](https://stackoverflow.com/questions/1897993/what-is-the-difference-between-concurrent-programming-and-parallel-programming) – user7860670 Nov 27 '20 at 07:18
  • 1
    `std::async` doesn't give you much control over threading, you might be better off creating your own `std::thread`s and divide the work evenly between them. Atomic variables might solve your problem or you might need to use mutexes, difficult to tell without a [mre] – Alan Birtles Nov 27 '20 at 07:27
  • 1
    I would recommend to read [C++ Concurrency in Action](https://www.amazon.com/C-Concurrency-Action-Anthony-Williams/dp/1617294691/). – Daniel Langr Nov 27 '20 at 07:30
  • 1
    `std::async` opens a new thread of execution behind the scenes. `std::async` and `std::future` are terrible for many reasons. You can try my own concurrency library (which also allows writing parallel algorithm easily) [concurrencpp](https://github.com/David-Haim/concurrencpp) – David Haim Nov 27 '20 at 13:40
  • @DavidHaim Thank you. Does your library work with C++ 11? Also, how can I install it? Can I simply copy/paste some files for installation? – Amir Nov 27 '20 at 13:42
  • 1. It needs c++20. you probably want C++20 in your project as well 2. you clone it, build it with cmake, add the `include/concurrencpp` as an include directory in your project and link your project with `libconcurrencpp.so`/`concurrencpp.lib` 3. no. research how cmake based projects are layed-out. – David Haim Nov 27 '20 at 13:50
  • @AlanBirtles As I said my implementation looks very very similar to the link that I have provided above. I'll update my question with that anyways. – Amir Nov 27 '20 at 13:50
  • @DavidHaim Ah I thought it's much more simpler than that. I cannot do this due to some limitations on my end. Thanks for your pointer though. – Amir Nov 27 '20 at 13:52
  • @AlanBirtles There you go – Amir Nov 27 '20 at 14:01
  • `do lots of work here` is the pertinent part of the code, we can't tell you how to fix code we can't see – Alan Birtles Nov 27 '20 at 14:03
  • launching a `std::async` task per pixel is likely to be extremely inefficient but shouldn't cause any more problems than any other threading method – Alan Birtles Nov 27 '20 at 14:04
  • @AlanBirtles Well that part contains like 500 lines of code. I am declaring lots of other variables in render() and solving for a quadratic equation, find intersections and so on. Does that matter? I would really appreciate if you can provide me an example that allows me to parallelize my code using `atomic` . I'm not even sure if I should be using atomic variables inside `render()` or in my `main()` method. – Amir Nov 27 '20 at 14:05
  • @AlanBirtles I also wanted to use Boost threads but for some reason I could not call my function using the library. It looks like Boost has issues calling a member function of an instance of a class and I could not figure out how to do it in practice. I tried what people had suggested (e.g. [here](https://stackoverflow.com/questions/4581476/using-boost-thread-and-a-non-static-class-function)) but got lots of errors during compilation. Even an example with Boost threads – Amir Nov 27 '20 at 14:12
  • you don't need to share all your code but you are asking how to update shared state from multiple threads but we can't see any of your existing code which is updating the shared state or even what the shared state is so its impossible for us to find the error or suggest improvements – Alan Birtles Nov 27 '20 at 14:16
  • @AlanBirtles I think I have a little bit of better idea of what you're talking about. So in my `render()` function I am updating a vector that contains the RGB data for each pixel. that variable is a member variable in my class. Maybe that is causing the issue? I will update my code and include that member variable now. – Amir Nov 27 '20 at 14:22
  • @AlanBirtles There is some dependencies in my code that on some class member variables that goes back to the time when I was writing the code without parallelization in mind. Do you think if I can somehow make sure I don't use any class member variables in my render function, things would be fine with `std::async`? By fine,I mean not only I don't get weird rendering results but also my code becomes truly parallelized? Maybe now the operating system has to pause the threads to let one of them access the memory block that belongs to that member variable, causing slowness. Do you agree with this? – Amir Nov 27 '20 at 14:41
  • No, the slowness is from calling async for every pixel, the overhead of async is probably more than the time required to execute your code, process a chunk of pixels (probably at least a line) in each async call so that its overhead will be less, you probably still wont get the same performance as if you'd used `std::thread` though due to `std::async`'s limitations – Alan Birtles Nov 27 '20 at 16:29
  • @AlanBirtles Would using thread pool change things in any way? Sorry I'm too novice for these sorts of things especially in C++ – Amir Nov 27 '20 at 18:00
  • whatever method you use processing 1 pixel at a time is unlikely to be optimal, e.g. if you want to run on 4 threads splitting the work into more than something like 16 units is unlikely to be efficient or beneficial – Alan Birtles Nov 27 '20 at 20:30

0 Answers0