Thread Queue C++

Question

'''The original post has been edited'''

How can I make a thread pool for two for loops in C++? I need to run the start_thread function 22 times for each number between 0 and 6. And I will have a flexible number of threads available depending on the machine I am using. How can I create a pool to allocate the free threads to the next of the nested loop?

for (int t=0; t <22; t++){
    for(int p=0; p<6; p++){
        thread th1(start_thread, p);
        thread th2(start_thread, p);
        th1.join();
        th2.join();
     }
}

There are some ideas here for thread pools in c++11 and greater: [https://stackoverflow.com/questions/15752659/thread-pooling-in-c11](https://stackoverflow.com/questions/15752659/thread-pooling-in-c11) — drescherjm, Aug 17 '20 at 22:01
Does this answer your question? [Thread pooling in C++11](https://stackoverflow.com/questions/15752659/thread-pooling-in-c11) — asmmo, Aug 17 '20 at 22:28
[https://github.com/Tyler-Hardin/thread_pool](https://github.com/Tyler-Hardin/thread_pool) may be what you want however I have not tested the code. — drescherjm, Aug 17 '20 at 22:30
Boost has a thread pool as well: [https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/thread_pool.html](https://www.boost.org/doc/libs/1_74_0/doc/html/boost_asio/reference/thread_pool.html) — drescherjm, Aug 17 '20 at 22:33
Opening a lot of threads will not make your program run faster because creating a new thread is expensive. Better to open a small number of concurrent threads and use a queue to distribute work between them. — jignatius, Aug 18 '20 at 04:42
Are there 22*6 unique work packages or do you really want to work on the same 6 work packages 22 times (requiring synchronization each time they are ready)? Starting two threads on the same work package looks strange. Is there any reason why you don't want to go over the same package twice in the same thread? — Ted Lyngmo, Aug 18 '20 at 22:00

prog-fh · Accepted Answer · 2020-08-17T22:30:33.783

Not really certain about what you want, but maybe it's something like this.

for (int t=0; t <22; t++){
        std::vector<std::thread> th;
        for(int p=0; p<6; p++){
                th.emplace_back(std::thread(start_thread, p));
        }
        for(int p=0; p<6; p++){
                th[i].join();
        }
}

(or maybe permute the two loops)

Edit if you want to control the number of threads

#include <iostream>
#include <thread>
#include <vector>

void
start_thread(int t, int p)
{
  std::cout << "th " << t << ' ' << p << '\n';
}

void
join_all(std::vector<std::thread> &th)
{
  for(auto &e: th)
  {
    e.join();
  }
  th.clear();
}

int
main()
{
  std::size_t max_threads=std::thread::hardware_concurrency();
  std::vector<std::thread> th;
  for(int t=0; t <22; ++t)
  {
    for(int p=0; p<6; ++p)
    {
      th.emplace_back(std::thread(start_thread, t, p));
      if(size(th)==max_threads)
      {
        join_all(th);
      }
    }
  } 
  join_all(th);
  return 0;
}

score 1 · Answer 2 · answered Aug 17 '20 at 23:05

If you don't want dependency on a third-party library, this is pretty simple.

Just create a number of threads you like and let them pick a "job" from some queue.

For example:

#include <iostream>
#include <mutex>
#include <chrono>
#include <vector>
#include <thread>
#include <queue>

void work(int p)
{
  // do the "work"
  std::this_thread::sleep_for(std::chrono::milliseconds(200));
  std::cout << p << std::endl;
}

std::mutex m;
std::queue<int> jobs;
void worker()
{
  while (true)
  {
    int job(0);
    // sync access to the jobs queue
    {
      std::lock_guard<std::mutex> l(m);
      if (jobs.empty())
        return;
      job = jobs.front();
      jobs.pop();
    }
    work(job);
  }
}

int main()
{
  // queue all jobs
  for (int t = 0; t < 22; t++) {
    for (int p = 0; p < 6; p++) {
      jobs.push(p);
    }
  }

  // create reasonable number of threads
  static const int n = std::thread::hardware_concurrency();
  std::vector<std::thread> threads;
  for (int i = 0; i < n; ++i)
    threads.emplace_back(std::thread(worker));
  // wait for all of them to finish
  for (int i = 0; i < n; ++i)
    threads[i].join();
}

[ADDED] Obviously, you don't want global variables in your production code; this is simply a demo solution.

score 0 · Answer 3 · answered Aug 17 '20 at 22:39

Stop trying to code and draw out what you need to do and the pieces you need to have in order to do it.

You need one queue to hold the jobs, one mutex to protect the queue so the threads don't smurf it up with simultaneous accesses, and N threads.

Each thread function is a loop that

grabs the mutex,
gets a job from the queue,
releases the mutex, and
processes the job.

In this case I'd keep things simple by exiting the loop and the thread when there are no more jobs in the queue in step 2. In production you'd have the thread block and wait on the queue so it's still available to service jobs added later.

Wrap that up in a class with a function that allows you to add jobs to the queue, a function to start N threads, and a function to join on all of the running threads.

main defines an instance of the class, feeds in the jobs, starts the thread pool and then blocks on join until everyone's done.

Once you've beaten the design into something you have high confidence does what you need it to do, then you start writing code. Write code, especially multi-threaded code, without a plan and you're in for a lot of debugging and re-writing that usually exceeds the time spent on design by a significant margin.

Apparently, while you were typing your design, I was typing its implementation :) — Vlad Feinstein, Aug 17 '20 at 23:06

score 0 · Answer 4 · answered Aug 18 '20 at 23:03

Since C++17 you can use one of the execution policies for many of the algorithms in the standard library. This can simplify going over a number of work packages greatly. What goes on behind the curtains is usually that it picks threads from a built-in thread pool and distribute work to them efficiently. It usually use just enough™ threads in both Linux and Windows and it'll use all the CPU you've got left (0% idle on all cores when the CPU:s have started spinning at max frequency) - strangely without making neither Linux nor Windows "sluggish".

Here I've used the execution policy std::execution::parallel_policy (indicated by the std::execution::par constant). If you can prepare the work that needs to be done and put it in a container, like a std::vector, it'll be really easy.

#include <algorithm>
#include <chrono>
#include <execution>  // std::execution::par
#include <iostream>
// #include <thread>  // not needed to run with execuion policies
#include <vector>

struct work_package {
    work_package() : payload(co) { ++co; }
    int payload;
    static int co;
};
int work_package::co = 10;

int main() {
    std::vector<work_package> wps(22*6);                         // 132 work packages
    for(const auto& wp : wps) std::cout << wp.payload << '\n';   // prints 10 to 141

    // work on the work packages
    std::for_each(std::execution::par, wps.begin(), wps.end(), [](auto& wp) {
        // Probably in a thread - As long as you do not write to the same work package
        // from different threads, you don't need synchronization here.

        // do some work with the work package
        ++wp.payload;
    });

    for(const auto& wp : wps) std::cout << wp.payload << '\n';   // prints 11 to 142
}

With g++ you may need to install tbb (The Threading Building Blocks) that you also need to link with: -ltbb.

apt install libtbb-dev on Ubuntu.
dnf install tbb-devel.x86_64 on Fedora.

Other distributions may call it something different.

Visual Studio (2017 and later) links with the proper library automatically (also tbb if I'm now mistaken).

Thread Queue C++

4 Answers4