Intel TBB tasks for serving network connections - good model?

Question

I'm developing a backend for a networking product, that serves a dozen of clients (N = 10-100). Each connection requires 2 periodic tasks, the heartbeat, and downloading of telemetry via SSH, each at H Hz. There are also extra events of different kind coming from the frontend. By nature of every of the tasks, there is a solid part of waiting in select call on each connection's socket, which allows OS to switch between threads often to serve other clients while waiting for response.

In my initial implementation, I create 3 threads per connection (heartbeat, telemetry, extra), each waiting on a single condition variable, which is poked every time there is something to do in a workqueue. The workqueue is filled with the above-mentioned periodic events using a timer and commands from the frontend.

I have a few questions here.

Would it be a good idea to switch a worker thread pool approach to Intel TBB tasks? If so, to which value of threads do I need to initialize tbb::task_scheduler_init?
In the current approach with 300 threads waiting on a conditional variable, which is signaled N * H * 3 times per second, it is likely to become a bottleneck for scalability (especially on the side which calls signal). Are there any better approaches for waking up just one worker per task?
How is waking of a worker thread implemented in TBB?

Thanks for your suggestions!

did you mean N=10000? the 'select' function can't select more than FD_SETSIZE connections, FD_SETSIZE is usually 1024 — jondinham, Nov 05 '12 at 09:44
3 threads/connection? i'm afraid that creating 30 thousand threads is kinda nuts — jondinham, Nov 05 '12 at 09:45
I meant 10 to 100. The select call is in libssh2 pipeline, it is responsible for just one socket. — Tosha, Nov 05 '12 at 09:46

score 1 · Answer 1 · edited May 23 '17 at 12:27

Its difficult to say if switching to TBB would be a good approach or not. What are your performance requirements, and what are the performance numbers for the current implementation? If the current solution is good enough, than its probably not worth-while to switch.

If you want to compare the both (current impl vs TBB) to know which gives better performance, then you could do what is called a "Tracer bullet" (from the book The Pragmatic Programmer) for each implementation and compare the results. In simpler terms, do a reduced prototype of each and compare the results.

As mentioned in this answer, its typically not a good idea to try to do performance improvements without having concrete evidence that what you're going to change will improve.

Besides all of that, you could consider making a thread pool with the number of threads being some function of the number of CPU cores (maybe a factor of 1 or 1.5 threads per core) The threads would take off tasks from a common work-queue. There would be 3 types of tasks: heartbeat, telemetry, extra. This should reduce the negative impacts caused by context switching when using large numbers of threads.

Intel TBB tasks for serving network connections - good model?

1 Answers1