193

This might seem like a basic question, but I could not find any documentation :

What is the difference between forking & spawning a node.js process? I have read that forking is a special case of spawning, but what are the different use cases / repecussions for using each of them?

Hitesh
  • 2,045
  • 2
  • 14
  • 9

4 Answers4

270

Spawn is a command designed to run system commands. When you run spawn, you send it a system command that will be run on its own process, but does not execute any further code within your node process. You can add listeners for the process you have spawned, to allow your code interact with the spawned process, but no new V8 instance is created(unless of course your command is another Node command, but in this case you should use fork!) and only one copy of your node module is active on the processor.

Fork is a special instance of spawn, that runs a fresh instance of the V8 engine. Meaning, you can essentially create multiple workers, running on the exact same Node code base, or perhaps a different module for a specific task. This is most useful for creating a worker pool. While node's async event model allows a single core of a machine to be used fairly efficiently, it doesn't allow a node process to make use of multi core machines. Easiest way to accomplish this is to run multiple copies of the same program, on a single processor.

A good rule of thumb is one to two node processes per core, perhaps more for machines with a good ram clock/cpu clock ratio, or for node processes heavy on I/O and light on CPU work, to minimize the down time the event loop is waiting for new events. However, the latter suggestion is a micro-optimization, and would need careful benchmarking to ensure your situation suits the need for many processes/core. You can actually decrease performance by spawning too many workers for your machine/scenario.

Ultimately you could use spawn in a way that did the above, by sending spawn a Node command. But this would be silly, because fork does some things to optimize the process of creating V8 instances. Just making it clear, that ultimately spawn encompasses fork. Fork is just optimal for this particular, and very useful, use case.

http://nodejs.org/api/child_process.html#child_process_child_process_exec_command_options_callback

bjb568
  • 11,089
  • 11
  • 50
  • 71
MobA11y
  • 18,425
  • 3
  • 49
  • 76
  • @ChrisCM, if I use let's say `var child = require('child_process').fork('child.js');` for example on my main app, I will now have 2 seperate cores running. If I were to run a **heavy** for loop in the child.js (process), I'd essentially be utilizing more cores to power child.js, right? Would that cpu usage be effecting my main app core though? – NiCk Newman Sep 03 '15 at 16:40
  • 2
    It is impossible to do anything on a CPU without effecting other things. Scheduling, shared cache usage, BUS traffic, etc. However, it should take advantage of a separate core, and leave your main run loop MOSTLY unaffected. As in, not the severe negative effects you'd expect of having two processes run on the same single core processor. At this point, it is really up to the operating system and hardware setup to optimize properly. Different set ups may yield different results. – MobA11y Sep 03 '15 at 17:01
  • @ChrisCM Yeah, I use a global MonsterLoop to synchronize monster positioning and that object it iterates can be as much as 5,000 keys. I iterate over it every 2 seconds and forking it seems like it's shredding off hundreds of memory usage off my CPU (main game one). I would rather do it this way instead of clustering that loop out and making it run xx amount of times per core I had... Ty for your insight ~ Now I just don't know if I should use Redis or the internal IPC :P – NiCk Newman Sep 03 '15 at 18:00
  • 2
    Thank you for addressing "why" - all the posts I read up til this one missed that simple portion of the explanation. – aaaaaa Sep 30 '16 at 13:09
  • @ChrisCM In you answer "..but does not execute any further code within your node process..". Does it mean that main thread is waiting and not processing anything..If YES then what is the use of using spawn here..? – Abhi Mar 15 '20 at 12:57
  • The process that is spawned does not. The process that does the spawning continues running asynchronously as usual. – MobA11y Mar 16 '20 at 19:23
39

Spawn

When spawn is called, it creates a streaming interface between the parent and child process. Streaming Interface — one-time buffering of data in a binary format.

Fork

When fork is called, it creates a communication channel between the parent and child process Communication Channel — messaging

Differences between Spawn and Fork

While both sound very similar in the way they transfer data, there are some differences.

  • Spawn is useful when you want to make a continuous data transfer in binary/encoding format — e.g. transferring a 1 Gigabyte video, image, or log file.
  • Fork is useful when you want to send individual messages — e.g. JSON or XML data messages.

Conclusion

Spawn should be used for streaming large amounts of data like images from the spawned process to the parent process.

Fork should be used for sending JSON or XML messages. For example, suppose ten forked processes are created from the parent process. Each process performs some operation. For each process, completing the operation will send a message back to the parent stating something like "Process #4 done" or "Process #8 done".

Andria
  • 4,712
  • 2
  • 22
  • 38
vijay
  • 10,276
  • 11
  • 64
  • 79
  • What about continuous logging data from parent into a child and finally inside a file? – Esqarrouth Apr 11 '20 at 02:33
  • 3
    @Esqarrouth , you need to identify whether it will be continuous stream or messages. And you used word "continuous loging" i believe u will be wrting to logs(JSON) to child,If yes then use `FORK` else if you have very big chunk of data to be **BUFFERED** then use `SPAWN` – vijay Apr 11 '20 at 06:14
9
  • spawnchild_process.spawn launches a new process with a given command.
  • fork − The child_process.fork method is a special case of the spawn() to create child processes.

The spawn() Method

child_process.spawn method launches a new process with a given command. It has the following signature −

child_process.spawn(command[, args][, options])

Read more about options

The spawn() method returns streams (stdout &stderr) and it should be used when the process returns a volume amount of data. spawn() starts receiving the response as soon as the process starts executing.

The fork() Method

child_process.fork method is a special case of spawn() to create Node processes. It has the following signature −

 child_process.fork(modulePath[, args][, options])

The fork method returns an object with a built-in communication channel in addition to having all the methods in a normal ChildProcess instance.

Igor Litvinovich
  • 2,436
  • 15
  • 23
1

Spawn -

  • Create a streaming interface. Useful for continuous data transfer in binary / encoded format.
  • Does not create a new v8 instance.

Fork-

  • Create communication channel b/w parent and child process. Useful for sending individual messages in json/xml.

  • Create a new v8 instance.