11

Node allows you to spawn child processes and send data between them. You could use it do execute some blocking code for example.

Documentation says "These child Nodes are still whole new instances of V8. Assume at least 30ms startup and 10mb memory for each new Node. That is, you cannot create many thousands of them."

I was wondering if is it efficient, should I worry about some limitations? Here's example code:

//index.js
var childProcess1 = childProcess.fork('./child1.js');

childProcess1.send(largeArray);

childProcess1.once('message', function(formattedData) {
  console.log(formattedData);
  return false;
});



//child1.js
process.on('message', function(data) {

  data = format(data); //do smth with data, then send it back to index.js

  try{
    process.send(data);
    return false;
  }
  catch(err){
    console.log(err);
    return false;
  }

});
youbetternot
  • 2,566
  • 2
  • 17
  • 20

2 Answers2

48

The documentation is telling you that starting new node processes is (relatively) expensive. It is unwise to fork() every time you need to do work.

Instead, you should maintain a pool of long-running worker processes – much like a thread pool. Queue work requests in your main process and dispatch them to the next available worker when it goes idle.

This leaves us with a question about the performance profile of node's IPC mechanism. When you fork(), node automatically sets up a special file descriptor on the child process. It uses this to communicate between processes by reading and writing line-delimited JSON. Basically, when you process.send({ ... }), node JSON.stringifys it and writes the serialized string to the fd. The receiving process reads this data until hitting a line break, then JSON.parses it.

This necessarily means that performance will be highly dependent on the size of the data you send between processes.

I've roughed out some tests to get a better idea of what this performance looks like.

First, I sent a message of N bytes to the worker, which immediately responded with a message of the same length. I tried this with 1 to 8 concurrent workers on my quad-core hyper-threaded i7.

graph

We can see that having at least 2 workers is beneficial for raw throughput, but more than 2 essentially doesn't matter.

Next, I sent an empty message to the worker, which immediately responded with a message of N bytes.

graph

Surprisingly, this made no difference.

Finally, I tried sending a message of N bytes to the worker, which immediately responded with an empty message.

graph

Interesting — performance does not degrade as rapidly with larger messages.

Takeaways

  • Receiving large messages is slightly more expensive than sending them. For best throughput, your master process should not send messages larger than 1 kB and should not receive messages back larger than 128 bytes.

  • For small messages, the IPC overhead is about 0.02ms. This is small enough to be inconsequential in the real world.

It is important to realize that the serialization of the message is a synchronous, blocking call; if the overhead is too large, your entire node process will be frozen while the message is sent. This means I/O will be starved and you will be unable to process any other events (like incoming HTTP requests). So what is the maximum amount of data that can be sent over node IPC?

graph

Things get really nasty over 32 kB. (These are per-message; double to get roundtrip overhead.)

The moral of the story is that you should:

  • If the input is larger than 32 kB, find a way to have your worker fetch the actual dataset. If you're pulling the data from a database or some other network location, do the request in the worker. Don't have the master fetch the data and then try to send it in a message. The message should contain only enough information for the worker to do its job. Think of messages like function parameters.

  • If the output is larger than 32 kB, find a way to have the worker deliver the result outside of a message. Write to disk or send the socket to the worker so that you can respond directly from the worker process.

josh3736
  • 139,160
  • 33
  • 216
  • 263
  • 1
    Damn. That's a helluva answer. – James Sumners Dec 06 '14 at 01:55
  • This is amazing, fantastic answer! How do I measure the size of a message? – youbetternot Dec 06 '14 at 15:05
  • 1
    It would be helpful if you also described your testing environment: what version of node you tested with, on what platform, and other details which could affect the results. – mscdex Dec 07 '14 at 19:22
  • Great answer. In case of writing to disk, you'll still hit a wall when doing JSON.parse on the main process as JSON parsing is blocking non-async. You can use a streaming lib like JSONStream but it's orders of magnitude slower than v8's JSON parse native implementation – Michael Oct 07 '15 at 13:34
0

This really depends on your server resources and the number of nodes you need to spin up.

As a rule of thumb:

  • Try reusing running children as much as possible - this will save you 30ms start up time
  • Do not start unlimited number of children (1 per request for instance) - you will not run out of RAM

The messaging itself it relatively fast i believe. Would be great to see some metrics though.

Also, note that if you have single CPU or running a cluster (using all available cores) it doesn't make much sense. You still have limited CPU capacity and switching context is more expensive than running single process.

Eugene Kostrikov
  • 6,799
  • 6
  • 23
  • 25
  • Well, an extra process to do CPU-bound work on a single-core CPU makes sense to prevent starving I/O. The context switching is worth it in this case. – josh3736 Dec 05 '14 at 18:49