0

I'm trying to speed up a for loop by using std::thread. The loop iterates over a list consisting of several million items. I give each iteration to a different thread.

After 4047 iterations it stops running and throws terminate called without an active exception Aborted (core dumped)

I believe this error usually caused by the threads not being properly joined (as stated in other questions on this site). However I do have a function to join all threads at the end of my for loop. Because the join function is not being reached I suspect the real problem is there are too many threads created. This is my first foray into lambdas and multithreading and I'm not sure how to limit the number of threads created at a time within a for loop.

My code is as follows:

std::mutex m;
std::vector<std::thread> workers;
for ( ot.GoToBegin(), !ot.IsAtEnd(); ++ot )  // ot is the iterator
{
    workers.push_back(std::thread([test1, test2, ot, &points, &m, this]() 
    {
        // conditions depending on the current ot are checked
        if ( test1 == true ) return 0;  // exit function
        if ( test2 == true ) return 0;
        // ...etc, lots of different checks are performed..

        // if conditions are passed save the current ot
        m.lock();
        points.push_back( ot.GetIndex() );
        m.unlock();
    }));
} // end of iteration
std::for_each(workers.begin(), workers.end(), [](std::thread &t) 
{
    t.join();  // join all threads
});

Any help would be much appreciated

jla
  • 4,191
  • 3
  • 27
  • 44
  • 2
    Possible duplicate of [C++ terminate called without an active exception](http://stackoverflow.com/questions/7381757/c-terminate-called-without-an-active-exception) – Alex Zywicki Feb 09 '17 at 01:29
  • Creating several million threads is not going to be pretty - I would recommend looking at thread-pooling. – Ken Y-N Feb 09 '17 at 01:31
  • The solution to that question was to have a function that joined all threads. As stated above I have already included a join function to join all threads on completion of the for loop – jla Feb 09 '17 at 01:33

1 Answers1

0

Since you get the error at the same iteration every time, the reason is not in "join" per se. Most likely, the number of threads per process on your system is limited by 4096 or similar number, see Maximum number of threads per process in Linux?

When you are creating the thread number 4047 or so the constructor of std::thread throws an exception and you never get to the "join" statement.

I would suggest you keep a vector not of std::tread(s) but of std::future(s). The code could look roughly like this:

typedef std::future<int> Future;
std::vector<Future> results;
for (...) {
    results.emplace_back( std::async(std::launch::async, 
     [...](){ /* roughly same code as in your thread function */ }) );
}
for ( Future& result : results) {
    auto value = result.get(); //waits for the task to finish 
    // process your values ...
}

The future(s) rely on the internal thread pooling so you will not run out of threads. These futures will be executed asynchronously as threads become available.

Community
  • 1
  • 1
Michael Simbirsky
  • 3,045
  • 1
  • 12
  • 24
  • 1
    Thanks for the suggestion. It does sound like it's being limited by the system. I implemented a vector of futures like you said, however that also ran for 4047 iterations before throwing system error `Resource temporarily unavailable`. I set up a counter that allocated futures in blocks of 4000 at a time, waited till they completed, cleared the vector and allocated another block of 4000 futures. This worked but the overhead is enormous - over 50 times slower than when not paralleslised. Back to the drawing board I guess. – jla Feb 09 '17 at 07:40
  • Good job! Most likely you need a real library with "thread pooling", e.g. Intel TBB. Alternatively, you can search for and use some ad-hoc thread-pooling modules. It would require some data structure redesign, for sure. For example, global mutex should go. Instead, each thread should carry its local vector of points until it is ready to merge them like in TBB template "parallel_reduce". – Michael Simbirsky Feb 09 '17 at 18:46