3

I have a task which is very well parallelizable so I would like to use multiple threads to speed up my program. However, it is not as simple as creating the threads and letting them run. The threads have to execute a certain task repeatedly with breaks inbetween, i.e. pseudo code would look like this:

loop
  wake threads up
  calculate x using the threads
  pause threads
  calculate something else without the threads

This happens very frequently, 60 times per second to be exact. That is why creating new threads every time would be by far too slow. I tried solving this with a state variable for each thread (Running, Paused, Stopped) and either an event-like construct with condition variables, or a polling mechanism.

Both of these only gave me about two times the speed which is not as much as I imagine to be possible, considering only about 5% of the time are spend within a critical section. (and my CPU offers 4 cores * 2 = 8 hyper threads)

I'd imagine the issue with the condition variable is that the wake up is not immediate but has some delay to it which means runtime wasted. The polling approach is slightly slower because, I guess, the code executed while the threads are paused will be slower because the threads are still using the CPU.

What would be the best way to implement what I have in mind?

Andreas T
  • 733
  • 7
  • 22
  • 6
    This problem sounds ideal for thread pooling and task-based multithreading. Particularly if the units of multithreaded work have a large variance in computation time. – Sneftel Aug 11 '14 at 10:38
  • The question remains the same: How would the thread pool be implemented? – Andreas T Aug 11 '14 at 10:43
  • 2
    You should look at [thread pooling in c++11](http://stackoverflow.com/questions/15752659/thread-pooling-in-c11) – quantdev Aug 11 '14 at 10:44
  • 3
    Did you time the entire program or just the threaded part? You need to take into account [Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law) in your expectations. – molbdnilo Aug 11 '14 at 11:04
  • You just have to implement *thread pool*; For efficient implmentation, you'll need some OS-dependent features, such as IOCP. (or just use Boost.Asio or `std::future` >o<) – ikh Aug 11 '14 at 11:06
  • 2
    @AndreasT Intel TBB is a good way to go. – Sneftel Aug 11 '14 at 11:10
  • @ molbdnilo Only the threaded part. Even with Amdahl's law in consideration, I would expect at least a speed up to 300%. – Andreas T Aug 12 '14 at 13:45

2 Answers2

4

If you wish to have threads block until something happens, you can pair a std::mutex (with its lock held by a std::unique_lock<std::mutex>) with a std::condition_variable.

You use the wait member of the std::condition_variable, pass it the std::unique_lock<std::mutex>, and a functor which returns true when the condition for the thread to awaken is met. As the thread waits on the std::condition_variable, it surrenders the lock. When the condition is met, and notify_one (assuming the thread-in-question is the thread selected for notification) or notify_all is called on the std::condition_variable, it wakes up, reacquires the lock, checks the condition, and returns (with the lock still held) when/if it's true.

  • My question was if there is a better (read "faster") way to implement this other than the one you described. – Andreas T Aug 12 '14 at 13:44
2

You could use Intel TBB. If your Task is simple enough you can maybe use one of the simple algorithms like parallel_for. (parallel_for is easier with lambas: https://software.intel.com/en-us/blogs/2009/08/03/parallel_for-is-easier-with-lambdas-intel-threading-building-blocks)

palfi
  • 288
  • 2
  • 6
  • I tried parallel_for_each and it is slightly faster than my own two implementations but still not as fast I would expect it to be. I might look into the code again, maybe there is a weird bottleneck I oversaw. – Andreas T Aug 12 '14 at 13:46