5

Environment: Ubuntu 16.04 - Linux , compiling C++11 using GCC. Software doesn't need to be cross-platform - but efficient in it's task, and be an excellent daemon.

I have a somewhat simple application at the moment, which basically acts as an intermediary between a third-party service and a websocket connection. so users connect to my service via websocket to talk to said third-party service.

| End-User |  <-C1-> | My Application | <-C2-> | 3rd Party Service |

My applicaiton is threaded with 2 main threads currently:

  1. Thread 1 - Listens to websocket connection, whenever it gets a message, it pushes to a fifo task queue an object that contains the message and the websocket connection that requested said message.

  2. Thread 2 - Loops through the queue of message, and pops out a message and processes it

The issue is that Thread 1 is pretty fast, and can easily handle 100s of websocket connections. Thread 2 does blocking tasks at times and can be slow, since certain queue items processing takes a while by said third party service. This means that if a User A does request 1, which takes 5 seconds to respond, than User B who came afterwards and make request 2, will have to wait for user A's request 1 to finish, even if request 2 takes less than 1ms.

My proposed solution for this is to have:

  • Thread 1 - WebSocket connection listener
  • Thread 2 Task delegator
  • Thread 3 - 100 - Task workers

Thread 1 can handle easily 100s of websocket connections, and each connection can make requests of task that can take anywhere between 1ms - 1 minute. all the thread 3 - 100 are sleeping. The reason for this many threads is because if there are 50-60 connections which all make different long-running requests, than each of these time consuming calls only block a single thread, the other threads are still free to work on the queue, and do other tasks.

I know switching threads is an intensive operation, but i am not aware of any other approach than mulithreading here.

I have solved the same issue with a single thread - but the problem was that the server stopped processing any websocket messages when it was blocking by waiting for the third party service. So i bumped it up to two threads - one for websocket, and the other for handling the task queue. But now the problem is a single worker on the task queue is slow since it is sequentially handling blocking IO operations.

Does this sound like a terrible design idea? Any thoughts on best practices?

mur
  • 879
  • 1
  • 13
  • 26
  • 2
    You can run out of handles, yes. Also the number of (active) threads should be balanced with the number of available CPU cores. –  Feb 12 '18 at 13:32
  • 1
    You might want to look up thread pools. – UKMonkey Feb 12 '18 at 13:34
  • The number of threads won't be the problem: https://stackoverflow.com/questions/481900/whats-the-maximum-number-of-threads-in-windows-server-2003 – rollstuhlfahrer Feb 12 '18 at 13:35
  • Another good read how to organize messaging queues efficiently is ZeroMQ. –  Feb 12 '18 at 13:36
  • 2
    Threads use system resources, which are finite, so it is always possible to have too many threads. The limits are system dependent. In any event, your idea has a lot in common with a software design pattern known as "thread pool". You may need to consider using asynchronous I/O, if your host system supports it. – Peter Feb 12 '18 at 13:38
  • Your threads 3-100 are generally called a thread pool. It's the lazy mans solution to do non-blocking IO. If you want things to scale and know which requests will take 5s you might want to make 2 (or more) pools. One for short requests and one for long. Otherwise you end up will all threads being blocked by long requests at some point. This can also keep the short request threads in cache. – Goswin von Brederlow Feb 12 '18 at 14:03
  • @GoswinvonBrederlow - yes, it's lazy mans non-blocking io. if i use a thread pool, and most of the tasks are justing waiting for IO, than the entire pool can become used up. and i might just need to increase the pool size to 100 to accomodate for the expected blocking. meaning we are back to a lot of threads.. right? – mur Feb 12 '18 at 14:56
  • @Peter linux supports async io, but the API i am using for making C2 connection from my app aren't using asynchronous sockets. they are written with legacy old code... heavily tested, but old. – mur Feb 12 '18 at 14:58
  • 'Task delegator' not necessary. – Martin James Feb 12 '18 at 15:04
  • 'a single worker on the task queue is slow since it is sequentially handling blocking IO' OK, First try - just add a fixed number of work threads reading from the queue, (which should be a 'real' producer-consumer queue, ie. blocks on empty).. Maybe it's just adding a one-liner for loop to try that:) Try that 100.. – Martin James Feb 12 '18 at 15:09
  • @mercy You only have the two choices. Threads or non-blocking IO. When you use up all the threads in a pool the next request will block (if you don't resize the pool). My point of using two pools was to make sure quick requests only wait for another quick request to finish while long requests get to wait for other long requests to finish. Long requests usually also means lots of CPU use or disk IO. You want to limit those without blocking the quick requests too. – Goswin von Brederlow Feb 15 '18 at 13:04

3 Answers3

9

Is there such a thing as too many threads?

Yes

100 threads

Should be fine, if somewhat suboptimal on any desktop/server. I have had a laptop refuse to continue after ~2000 threads in a process.

Other strategies

For maxiumum throughput, a common design decision is ~1 thread per cpu core with an asynchronous reactor-based design.

Some examples:

  • libuv

  • boost::asio

  • libdispatch

  • win32 async operations

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • 1
    1 thread per core is fine for CPU bottlenecked tasks; but when the issue is IO bound it's a lot more fuzzy. I've worked on an application with 100's of threads which were just performing ping tasks; because more often than not the thread is in a blocking function awaiting network data. – UKMonkey Feb 12 '18 at 13:38
  • @UKMonkey this is the kind of thing that reactor-based designs seek to mitigate. But yes, we've all been there... – Richard Hodges Feb 12 '18 at 13:39
  • I believe the ZeroMQ approach is worth mentioning. Especially when it comes to MQ design patterns like _reactors_. –  Feb 12 '18 at 13:41
  • Some moe things to look into: POSIX async IO (which uses threads under Linux by the way), select/poll/epoll, non-blocking sockets – Goswin von Brederlow Feb 12 '18 at 14:00
5

"Is there such a thing as too many threads?" - Yes. Threads consume system resources that you may run out of. Threads need to be scheduled; requires work by the kernel as well as time on the CPU (even if they then deside to do nothing). More threads increases complexity, making your program harder to reason about and harder to debug.

There is indeed such a thing as too many threads - never create more than you need / what makes sense for your program.

Jesper Juhl
  • 30,449
  • 3
  • 47
  • 70
  • Threads don't need to be scheduled if they are not ready/running. The box I post this on is managing 1044 threads ATM. No problem at all. CPU use currently 2-3%, (utorrent and Firefox, mostly). – Martin James Feb 12 '18 at 15:16
  • @Martin James sure, but the kernel still had to *consider* them. They still take up place in the run-queue and they still occupy memory. – Jesper Juhl Feb 12 '18 at 15:18
  • 'requires work by the kernel as well as time on the CPU (even if they then deside to do nothing' go on, try it. Make 1000 threads that are, say, looping round a long sleep call. Wait a bit for all those threads to get created, and see what happens to the performance of your box, [nothing noticeable]. – Martin James Feb 12 '18 at 15:18
  • 'They still take up place in the run-queue' no. 'they still occupy memory' - their stacks, code can be paged out, same as any other paged memory. As long as you don't run out of virtual memory by trying to make thousands of threads with huge stack limits, not-ready threads just sorta 'disappear'. – Martin James Feb 12 '18 at 15:21
2

The easiest and probably the most efficient way is using a thread pool. The thread pools are normally implemented by OS (or underlying platform like .NET) and are optimized for best throughput. Thread pools continuously monitor the efficiency of execution of tasks that were sent to the thread pool and dynamically create new threads if throughput goes down or release threads if they have noting to do for some time. Algorithms when to create and release threads are quite sophisticated and for most purposes the thread pool is the most efficient way how to split the workload to multiple threads.

Both Windows and Linux support thread pools, but since you are using C++ 11, you can also use standard C++ library thread pool that can be used by calling function std::async. Here is some nice sample.

Timmy_A
  • 1,102
  • 12
  • 9