I have a large number (>>100K) of tasks with very high latency (minutes) and very little resource consumption. Potentially they could all be executed in parallel and I was considering using std::async
to generate one future for each task.
My question is: what is the maximum number of threads that std::async will create and execute asynchronously? (using g++ 6.x on Ubuntu 16-xx or CentOs 7.x - x86_64)
It is important for me to get that number right because if I do not have enough tasks actually running (waiting) in parallel the cumulative cost of latency will be very high.
To get to an answer, I started by checking the capabilities of the system:
bob@vb:~/programming/cxx/async$ ulimit -u
43735
bob@vb:~/programming/cxx/async$ cat /proc/sys/kernel/threads-max
87470
From these numbers, I was expecting to be able to get in the order of 43K threads running (mostly waiting) in parallel. To verify that, I wrote the program below to check the number of distinct thread ids and the time required to call 100K std::async
with an empty task:
#include <thread>
#include <future>
#include <iostream>
#include <vector>
#include <algorithm>
#include <chrono>
#include <string>
std::thread::id foo()
{
using namespace std::chrono_literals;
//std::this_thread::sleep_for(2s);
return std::this_thread::get_id();
}
int main(int argc, char **argv)
{
if (2 != argc) exit(1);
const size_t COUNT = std::stoi(argv[1]);
std::vector<decltype(std::async(foo))> futures;
futures.reserve(COUNT);
while (futures.capacity() != futures.size())
{
futures.push_back(std::async(foo));
}
std::vector<std::thread::id> ids;
ids.reserve(futures.size());
for (auto &f: futures)
{
ids.push_back(f.get());
}
std::sort(ids.begin(), ids.end());
const auto end = std::unique(ids.begin(), ids.end());
ids.erase(end, ids.end());
std:: cerr << "COUNT: " << COUNT << ": ids.size(): " << ids.size() << std::endl;
}
The time was fine but the number of distinct thread ids was much less than expected (32748 instead of 43735):
bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 100000
COUNT: 100000: ids.size(): 32748
0:03.29
Then I un-commented the sleep line in foo
to add a 2s sleeping time. The resulting timings are consistent with 2s up to 10K tasks or so, but at some point beyond that, some tasks end-up sharing the same thread id and the elapsed time increases by 2s for each additional task:
bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10056
COUNT: 10056: ids.size(): 10056
0:02.24
bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10057
COUNT: 10057: ids.size(): 10057
0:04.27
bob@vb:~/programming/cxx/async$ /usr/bin/time -f "%E" ./testAsync 10058
COUNT: 10058: ids.size(): 10057
0:06.28
bob@vb:~/programming/cxx/async$ ps -eT | wc -l
277
So, it looks that for my problem, on this system, the limit is in the order of 10K. I checked on another system and the limit was in the order of 4K.
I can't figure out:
- why these values are so small
- how to predict these values from the specs of the system