1

Using this boost asio based thread pool, in this case the class is named ThreadPool, I want to parallelize the population of a vector of type std::vector<boost::shared_ptr<T>>, where T is a struct containing a vector of type std::vector<int> whose content and size are dynamically determined after struct initialization.

Unfortunately, I am a newb at both c++ and multi threading, so my attempts at solving this problem have failed spectacularly. Here's an overly simplified sample program that times the non-threaded and threaded versions of the tasks. The threaded version's performance is horrendous...

#include "thread_pool.hpp"
#include <ctime>
#include <iostream>
#include <vector>


using namespace boost;
using namespace std;


struct T {
  vector<int> nums = {};
};


typedef boost::shared_ptr<T> Tptr;
typedef vector<Tptr> TptrVector;


void create_T(const int i, TptrVector& v) {
  v[i] = Tptr(new T());
  T& t = *v[i].get();
  for (int i = 0; i < 100; i++) {
    t.nums.push_back(i);
  }
}


int main(int argc, char* argv[]) {
  clock_t begin, end;
  double elapsed;

  // define and parse program options

  if (argc != 3) {
    cout << argv[0] << " <num iterations> <num threads>" << endl;
    return 1;
  }
  int iterations = stoi(argv[1]),
      threads    = stoi(argv[2]);

  // create thread pool
  ThreadPool tp(threads);

  // non-threaded
  cout << "non-thread" << endl;
  begin = clock();

  TptrVector v(iterations);
  for (int i = 0; i < iterations; i++) {
    create_T(i, v);
  }

  end = clock();
  elapsed = double(end - begin) / CLOCKS_PER_SEC;
  cout << elapsed << " seconds" << endl;

  // threaded
  cout << "threaded" << endl;
  begin = clock();

  TptrVector v2(iterations);
  for (int i = 0; i < iterations; i++) {
    tp.submit(boost::bind(create_T, i, v2));
  }
  tp.stop();

  end = clock();
  elapsed = double(end - begin) / CLOCKS_PER_SEC;
  cout << elapsed << " seconds" << endl;

  return 0;
}

After doing some digging, I think the poor performance may be due to the threads vying for memory access, but my newb status if keeping me from exploiting this insight. Can you efficiently populate the pointer vector using multiple threads, ideally in a thread pool?

alan
  • 3,246
  • 1
  • 32
  • 36

1 Answers1

4

you haven't provided neither enough details or a Minimal, Complete, and Verifiable example, so expect lots of guessing.

createT is a "cheap" function. Scheduling a task and an overhead of its execution is much more expensive. It's why your performance is bad. To get a boost from parallelism you need to have proper work granularity and amount of work. Granularity means that each task (in your case one call to createT) should be big enough to pay for multithreading overhead. The simplest approach would be to group createT calls to get bigger tasks.

Andriy Tylychko
  • 15,967
  • 6
  • 64
  • 112
  • I agree with gruffalo, also since you are only simulating work, you could just use a sleep. But I did want to make a couple of points. `new T()` does not do what you think it does. Instead of allocating a default constructed T, it actually creates a temporary default constructed T, then uses the *copy constructor* for the newly allocated object, and lastly destroys the temp object. Also it is more clear to include types for all variable declarations, especially if they are on their own lines. – troycurtisjr Nov 05 '17 at 01:31
  • 1
    Thanks for the help. It turns out the multi-threading was improving the run-time of the program, but ctime was not revealing this since it measures CPU time rather than wall clock time. Switching to chrono made the speedup apparent. Regardless, batching jobs is good advice and did further improve the run-time performance of the program. So I will mark this as the correct answer. – alan Dec 19 '17 at 20:29