-1

(g++ 4.6.3, cygwin, Windows 10)

I'm not sure if there is some way to speed up the following program using multithreading mechanisms (with which I'm quite unfamiliar):

// ab.h
class A {
    // Member variables
    ...
    // Member functions
    A();
    ~A();
    int foo_1();
    void foo_2(std::vector<int>);
    ...
}

class B {
    ...
    void schedule(std::vector<A>& va);
    ...
}

// b.cc
...
void B::schedule(std::vector<A>& va) {
    std::vector<int> vc;
    vc.resize(va.size());
    for (... /* i from 0 to vc.size() */) {
        ...
        vc[i] = va[i].foo_1();
        ...
    }
    for (... /* i from 0 to va.size() */) {
        ...
        va[i].foo_2(vc);
        ...
    }
    // 5 more pairs of "for" loops like the above block
    ...
}

// main.cc
int main() {
    ...
    std::vector<A> va;
    // va.size() can be some large like 1000000
    ...
    B b;
    int simTime = 1000000000; // some large number of iterations
    for (int clock = 0; clock != simTime; ++clock) {
        b.schedule(va);
    }
    ...
    return 0;
}

So basically, I have a bunch of objects of type A, which "advance" as clock grows and meanwhile communicate with each other. My concerns are:

  1. I've just started rewriting each of my for loop pairs using std::async and std::get(). Is this efficient? I've heard from somewhere that std::async is most suitable to functions involving long time processing (like I/O) since the overhead of constructing/destructing a thread is not negligible (?). However, my foo_1 and foo_2 functions are not that "big".
  2. If constructing/destructing a thread is expensive, then it should be better to create a bunch of threads needed only at the beginning. But in my case, that would be multiple "threads of objects" (I guess which is impossible) instead of "threads of member functions" (?). Is it possible to create a thread only once to serve one object but later "attach" its different member functions only without the constructing/destructing overhead? If so, how?

My code runs long (even after some optimization by myself), while there is a powerful 8-core server...

vincentvangaogh
  • 354
  • 3
  • 11
  • Just looking at the above, you seem to somewhat needlessly copy a vector `vc` `va.size()` times. Your problem might be ameniable to a thread pool (either hand-written, or something like microsofts ppl). See [this question](https://stackoverflow.com/questions/31072279/implementing-a-simple-generic-thread-pool-in-c11) for not complete guidance on writing a thread pool. – Yakk - Adam Nevraumont Aug 09 '15 at 19:42
  • This is not a valid question. You state, that something is wrong, "because it takes long to execute", yet you don't know what is the *exact* problem (or even if there is a problem *at all*). You also think, that it may be solved *"using multi-threading mechanisms, with which you're quite unfamiliar"*. What are we suppose to do? Investigate your bad design and write a new API from scratch? Come on. Profile your code and find **real** problem, then come back. This is not a debugging/profiling service. – Mateusz Grzejek Aug 09 '15 at 19:47
  • Using multiple threads doesn't necessarily mean to improve performance. Especially if there's actually not more than a single CPU. – πάντα ῥεῖ Aug 09 '15 at 19:53
  • @Mateusz Grzejek Yes, this is not a debugging/profiling service (What makes you think it is?). I'm just asking for some advice on the possibility of speeding up my program using multithreading, regardless of my own bugs which, of course, are at my discretion. I'll not get your advice, test it, and get back to you to blame if it doesn't work possibly because of my own bugs. I'll appreciate any help on using multithreading on my given code. – vincentvangaogh Aug 09 '15 at 20:00

1 Answers1

4

Having 8 cores on your CPU and using only one could seem a waste of resources. So your question is perfectly justified. As you give only few information about your performance issue, I can only give you some general thoughts.

Multithreading is not necessarily the best answer to all performance issues

If you create threads, you'll need to synchronize access shared information in order to avoid data races. If many such synchronisations are needed, you have the risk of having contention between the threads (i.e. time wasted waiting for resources being made ready by the other threads). This can easily make you loose benefits from multithreading.

In your perticular case, both loops access the same elements of the vector va. Unfortunately, neither foo_1() nor foo2() are declared const, which mean that they could modify the vector's elements. So you have to carefully check this.

Multithreading improves throughput, not execution time

If you make use of all 8 cores, you'll notice that each core will be a little slower than performing the same code non-threaded. Fortunately, if each thread is independent of each other and in absence of contention, the overall throughput will be superior (bluntly said: if 2 threads perform at 80% of non-threaded, both together are still at 160% of the non threaded).

Attention: if you create more threads than your CPU can handle, the additional threads will have to wait, and will create an additional thread-management overhead. A useful guidance here is given by thread::hardware_concurrency() (keeping in mind that your application is not the only one that might create threads).

The choice of the weapons: Threadpools vs. std::async

If you create a threadpool at the beginning, the advantage is that you have a number of threads ready to fire. This could avoid thread creation overhead in the middle of a time critical situation. Keep in mind that the benefit is only real if you are in the range of hardware supported threads (unless you have a lot of waiting or IO latency)

On the other side, with std::async you are in the hands of your C++ implementation. For instance, some experimentation on MSVC showed that Microsoft's async did apparently reuse threads it created, in order to avoid the creation overhead. THis approach reduced too much iddle threads, and outperformed in some cases the thread pool.

Conclusions:

As the performance depends on your algorithms (primarily), on the CPU threading, on the OS, on the compiler+standard library, I'd strongly recommend that you do some benchmarks/profiling for choosing the best approach.

Community
  • 1
  • 1
Christophe
  • 68,716
  • 7
  • 72
  • 138
  • 1
    There're already implementations of simple C++11 thread pools, like [this](https://github.com/Youka/ThreadPool/blob/master/ThreadPool.hpp). – Youka Aug 09 '15 at 20:55
  • 1
    @Youka *Very* simple, in fact. I would not encourage to use this code, unless in simple home-made projects. Thread-pool is not just an object with bunch of threads and one method - it is something far more complicated. What about inter-thread communication? Deadlocks detection? Tasks interrupting? Linked thread-pool is not even *simple*. – Mateusz Grzejek Aug 09 '15 at 22:46