How to get the fastest data processing way: fork or/and multithreading

Question

Imagine that we have a client, which keeps sending lots of double data.

Now we are trying to make a server, which can receive and process the data from the client.

Here is the fact:
The server can receive a double in a very short time.
There is a function to process a double at the server, which needs more than 3 min to process only one double.

We need to make the server as fast as possible to process 1000 double data from the client.

My idea as below:
Use a thread pool to create many threads, each thread can process one double.

All of these are in Linux.

My question:
For now my server is just one process which contains multi-threads. I'm considering if I use fork(), would it be faster?
I think using only fork() without multithreading should be a bad idea but what if I create two processes and each of them contains multi-threads? Can this method be faster?

Btw I have read:
What is the difference between fork and thread?
Forking vs Threading

Please note that you have your wording backwards: a **server** provides services to call. You are describing a setup where the *server* keeps sending data to the *client* for processing. From a terminology point of view, that should be reversed. You have a **client** that wants to send data to the **server** for processing; not the other way round! — GhostCat, Sep 19 '16 at 08:21
@GhostCat Thanks. In fact, I call it "server" because it is the "server" that does `listen` and it is the "client" that does `connect`. — Yves, Sep 19 '16 at 08:31
Just saying: you should be precise in your wording, documentation, etc. to make sure that everybody on your team understand the nature of your serverclient, or is a clientserver? — GhostCat, Sep 19 '16 at 08:35

GhostCat · Accepted Answer · 2016-09-19T08:45:22.670

To a certain degree, this very much depends on the underlying hardware. It also depends on memory constraints, IO throughput, ...

Example: if your CPU has 4 cores, and each one is able to run two threads (and not much else is going on on that system); then you probably would prefer to have a solution with 4 processes; each one running two threads!

Or, when working with fork(), you would fork() 4 times; but within each of the forked processes, you should be distributing your work to two threads.

Long story short, what you really want to do is: to not lock yourself into some corner. You want to create a service (as said, you are building a server, not a client) that has a sound and reasonable design.

And given your requirements, you want to build that application in a way that allows you to configure how many processes resp. threads it will be using. And then you start profiling (meaning: you measure what is going on); maybe you do experiments to find the optimum for a given piece of hardware / OS stack.

EDIT: I feel tempted to say - welcome to the real world. You are facing the requirement to meet precise "performance goals" for your product. Without such goals, programmer life is pretty easy: most of the time, one just sits down, puts together a reasonable product and given the power of todays hardware, "things are good enough".

But if things are not good enough, then there is only one way: you have to learn about all those things that play a role here. Starting with things "which system calls in my OS can I use to get the correct number of cores/threads?"

In other words: the days in which you "got away" without knowing about the exact capacity of the hardware you are using ... are over. If you intend to "play this game"; then there are no detours: you will have to learn the rules!

Finally: the most important thing here is not about processes versus threads. You have to understand that you need to grasp the whole picture here. It doesn't help if you tune your client for maximum CPU performance ... to then find that network or IO issues cause 10x of "loss" compared to what you gained by looking at CPU only. In other words: you have to look at all the pieces in your system; and then you need to measure to understand where you have bottlenecks. And then you decide the actions to take!

One good reading about that would be "Release It" by Michael Nygard. Of course his book is mainly about patterns in the Java world; but he does a great job what "performance" really means.

My PC has 4 cores. But how could I know the capacity of each core? When I need multithreading, I just use it, I never consider the capacity of core... — Yves, Sep 19 '16 at 08:36
I updated my answer. Not sure if you will like that, but I think that is what you are up to. — GhostCat, Sep 19 '16 at 08:45
Seriously I really like your answer. I have had such a question since long time ago and your answer helped me a lot. Bad thing is: I can upvote only one time... — Yves, Sep 19 '16 at 09:03

score 0 · Answer 2 · answered Sep 19 '16 at 08:13

fork ing as such is way slower than kicking off a thread. A thread is much more lightweight (traditionally, although processes have caught up in the last years) than a full OS process, not only in terms of CPU requirements, but also with regards to memory footprint and general OS overhead.

As you are thinking about a pre-arranged pool of threads or processes, setup time would not account much during runtime of your program, so you need to look into "what is the cost of interprocess communications" - Which is (locally) generally cheaper between threads than it is between processes (threads do not need to go through the OS to exchang data, only for synchronisation, and in some cases you can even get away without that). But unfortunately you do not state whether there is any need for IPC between worker threads.

Summed up: I cannot see any advantage of using fork(), at least not with regards to efficiency.

How to get the fastest data processing way: fork or/and multithreading

2 Answers2