114

I stumbled over node.js sometime ago and like it a lot. But soon I found out that it lacked badly the ability to perform CPU-intensive tasks. So, I started googling and got these answers to solve the problem: Fibers, Webworkers and Threads (thread-a-gogo). Now which one to use is a confusion and one of them definitely needs to be used - afterall what's the purpose of having a server which is just good at IO and nothing else? Suggestions needed!

UPDATE:

I was thinking of a way off-late; just needing suggestions over it. Now, what I thought of was this: Let's have some threads (using thread_a_gogo or maybe webworkers). Now, when we need more of them, we can create more. But there will be some limit over the creation process. (not implied by the system but probably because of overhead). Now, when we exceed the limit, we can fork a new node, and start creating threads over it. This way, it can go on till we reach some limit (after all, processes too have a big overhead). When this limit is reached, we start queuing tasks. Whenever a thread becomes free, it will be assigned a new task. This way, it can go on smoothly.

So, that was what I thought of. Is this idea good? I am a bit new to all this process and threads stuff, so don't have any expertise in it. Please share your opinions.

Thanks. :)

Parth Thakkar
  • 5,427
  • 3
  • 25
  • 34
  • Please note: Workers are a browser specification- not a Javascript feature. – FredTheWebGuy Apr 28 '13 at 00:21
  • Well, I see that. My question was about node.js - server code and not about client side! – Parth Thakkar Apr 29 '13 at 16:21
  • Just a clarification- I see that the original question was about Webworkers in NodeJs, which is impossible- NodeJs uses "Threads". However, there is a NodeJS module floating around that allows WebWorker syntax within the NodeJs runtime. – FredTheWebGuy Apr 29 '13 at 18:52

7 Answers7

335

Node has a completely different paradigm and once it is correctly captured, it is easier to see this different way of solving problems. You never need multiple threads in a Node application(1) because you have a different way of doing the same thing. You create multiple processes; but it is very very different than, for example how Apache Web Server's Prefork mpm does.

For now, let's think that we have just one CPU core and we will develop an application (in Node's way) to do some work. Our job is to process a big file running over its contents byte-by-byte. The best way for our software is to start the work from the beginning of the file, follow it byte-by-byte to the end.

-- Hey, Hasan, I suppose you are either a newbie or very old school from my Grandfather's time!!! Why don't you create some threads and make it much faster?

-- Oh, we have only one CPU core.

-- So what? Create some threads man, make it faster!

-- It does not work like that. If I create threads I will be making it slower. Because I will be adding a lot of overhead to the system for switching between threads, trying to give them a just amount of time, and inside my process, trying to communicate between these threads. In addition to all these facts, I will also have to think about how I will divide a single job into multiple pieces that can be done in parallel.

-- Okay okay, I see you are poor. Let's use my computer, it has 32 cores!

-- Wow, you are awesome my dear friend, thank you very much. I appreciate it!

Then we turn back to work. Now we have 32 cpu cores thanks to our rich friend. Rules we have to abide have just changed. Now we want to utilize all this wealth we are given.

To use multiple cores, we need to find a way to divide our work into pieces that we can handle in parallel. If it was not Node, we would use threads for this; 32 threads, one for each cpu core. However, since we have Node, we will create 32 Node processes.

Threads can be a good alternative to Node processes, maybe even a better way; but only in a specific kind of job where the work is already defined and we have complete control over how to handle it. Other than this, for every other kind of problem where the job comes from outside in a way we do not have control over and we want to answer as quickly as possible, Node's way is unarguably superior.

-- Hey, Hasan, are you still working single-threaded? What is wrong with you, man? I have just provided you what you wanted. You have no excuses anymore. Create threads, make it run faster.

-- I have divided the work into pieces and every process will work on one of these pieces in parallel.

-- Why don't you create threads?

-- Sorry, I don't think it is usable. You can take your computer if you want?

-- No okay, I am cool, I just don't understand why you don't use threads?

-- Thank you for the computer. :) I already divided the work into pieces and I create processes to work on these pieces in parallel. All the CPU cores will be fully utilized. I could do this with threads instead of processes; but Node has this way and my boss Parth Thakkar wants me to use Node.

-- Okay, let me know if you need another computer. :p

If I create 33 processes, instead of 32, the operating system's scheduler will be pausing a thread, start the other one, pause it after some cycles, start the other one again... This is unnecessary overhead. I do not want it. In fact, on a system with 32 cores, I wouldn't even want to create exactly 32 processes, 31 can be nicer. Because it is not just my application that will work on this system. Leaving a little room for other things can be good, especially if we have 32 rooms.

I believe we are on the same page now about fully utilizing processors for CPU-intensive tasks.

-- Hmm, Hasan, I am sorry for mocking you a little. I believe I understand you better now. But there is still something I need an explanation for: What is all the buzz about running hundreds of threads? I read everywhere that threads are much faster to create and dumb than forking processes? You fork processes instead of threads and you think it is the highest you would get with Node. Then is Node not appropriate for this kind of work?

-- No worries, I am cool, too. Everybody says these things so I think I am used to hearing them.

-- So? Node is not good for this?

-- Node is perfectly good for this even though threads can be good too. As for thread/process creation overhead; on things that you repeat a lot, every millisecond counts. However, I create only 32 processes and it will take a tiny amount of time. It will happen only once. It will not make any difference.

-- When do I want to create thousands of threads, then?

-- You never want to create thousands of threads. However, on a system that is doing work that comes from outside, like a web server processing HTTP requests; if you are using a thread for each request, you will be creating a lot of threads, many of them.

-- Node is different, though? Right?

-- Yes, exactly. This is where Node really shines. Like a thread is much lighter than a process, a function call is much lighter than a thread. Node calls functions, instead of creating threads. In the example of a web server, every incoming request causes a function call.

-- Hmm, interesting; but you can only run one function at the same time if you are not using multiple threads. How can this work when a lot of requests arrive at the web server at the same time?

-- You are perfectly right about how functions run, one at a time, never two in parallel. I mean in a single process, only one scope of code is running at a time. The OS Scheduler does not come and pause this function and switch to another one, unless it pauses the process to give time to another process, not another thread in our process. (2)

-- Then how can a process handle 2 requests at a time?

-- A process can handle tens of thousands of requests at a time as long as our system has enough resources (RAM, Network, etc.). How those functions run is THE KEY DIFFERENCE.

-- Hmm, should I be excited now?

-- Maybe :) Node runs a loop over a queue. In this queue are our jobs, i.e, the calls we started to process incoming requests. The most important point here is the way we design our functions to run. Instead of starting to process a request and making the caller wait until we finish the job, we quickly end our function after doing an acceptable amount of work. When we come to a point where we need to wait for another component to do some work and return us a value, instead of waiting for that, we simply finish our function adding the rest of work to the queue.

-- It sounds too complex?

-- No no, I might sound complex; but the system itself is very simple and it makes perfect sense.

Now I want to stop citing the dialogue between these two developers and finish my answer after a last quick example of how these functions work.

In this way, we are doing what OS Scheduler would normally do. We pause our work at some point and let other function calls (like other threads in a multi-threaded environment) run until we get our turn again. This is much better than leaving the work to OS Scheduler which tries to give just time to every thread on system. We know what we are doing much better than OS Scheduler does and we are expected to stop when we should stop.

Below is a simple example where we open a file and read it to do some work on the data.

Synchronous Way:

Open File
Repeat This:    
    Read Some
    Do the work

Asynchronous Way:

Open File and Do this when it is ready: // Our function returns
    Repeat this:
        Read Some and when it is ready: // Returns again
            Do some work

As you see, our function asks the system to open a file and does not wait for it to be opened. It finishes itself by providing next steps after file is ready. When we return, Node runs other function calls on the queue. After running over all the functions, the event loop moves to next turn...

In summary, Node has a completely different paradigm than multi-threaded development; but this does not mean that it lacks things. For a synchronous job (where we can decide the order and way of processing), it works as well as multi-threaded parallelism. For a job that comes from outside like requests to a server, it simply is superior.


(1) Unless you are building libraries in other languages like C/C++ in which case you still do not create threads for dividing jobs. For this kind of work you have two threads one of which will continue communication with Node while the other does the real work.

(2) In fact, every Node process has multiple threads for the same reasons I mentioned in the first footnote. However this is no way like 1000 threads doing similar works. Those extra threads are for things like to accept IO events and to handle inter-process messaging.

UPDATE (As reply to a good question in comments)

@Mark, thank you for the constructive criticism. In Node's paradigm, you should never have functions that takes too long to process unless all other calls in the queue are designed to be run one after another. In case of computationally expensive tasks, if we look at the picture in complete, we see that this is not a question of "Should we use threads or processes?" but a question of "How can we divide these tasks in a well balanced manner into sub-tasks that we can run them in parallel employing multiple CPU cores on the system?" Let's say we will process 400 video files on a system with 8 cores. If we want to process one file at a time, then we need a system that will process different parts of the same file in which case, maybe, a multi-threaded single-process system will be easier to build and even more efficient. We can still use Node for this by running multiple processes and passing messages between them when state-sharing/communication is necessary. As I said before, a multi-process approach with Node is as well as a multi-threaded approach in this kind of tasks; but not more than that. Again, as I told before, the situation that Node shines is when we have these tasks coming as input to system from multiple sources since keeping many connections concurrently is much lighter in Node compared to a thread-per-connection or process-per-connection system.

As for setTimeout(...,0) calls; sometimes giving a break during a time consuming task to allow calls in the queue have their share of processing can be required. Dividing tasks in different ways can save you from these; but still, this is not really a hack, it is just the way event queues work. Also, using process.nextTick for this aim is much better since when you use setTimeout, calculation and checks of the time passed will be necessary while process.nextTick is simply what we really want: "Hey task, go back to end of the queue, you have used your share!"

Matt
  • 27,170
  • 6
  • 80
  • 74
hasanyasin
  • 6,222
  • 1
  • 17
  • 16
  • 10
    Amazing! Damn amazing! I loved the way you answered this question! :) – Parth Thakkar Jul 01 '12 at 05:56
  • Btw, would you like to answer my another unanswered question: http://stackoverflow.com/questions/11175355/how-to-model-push-notifications-on-server ? It'll be really helpful. :) – Parth Thakkar Jul 01 '12 at 06:18
  • 50
    Sure :) I really cannot believe there are extremely mean people out there down-voting this answer-article! Questioner calls it "Damn Amazing!" and a book author offers me writing on his website after seeing this; but some geniuses out there down-votes it. Why don't you share your bright intellectual quality and comment on it instead of meanly and sneakily down-voting, huh? Why something nice disturbs you that much? Why do you want to prevent something useful to reach other people who can really benefit from it? – hasanyasin Jul 01 '12 at 12:39
  • 1
    @hasanyasin just ignore the haters man, it's much better for your constitution – jcollum Jul 03 '12 at 18:15
  • 10
    This isn't a completely fair answer. What about computationally expensive tasks, where we can't "quickly end" our function call? I believe some people use some `setTimeout(...,0)` hacks for this, but using a separate thread in this scenario would surely be better? – mpen Mar 07 '13 at 20:49
  • @Mark, I tried to address this question updating my answer. Thank you very much for the contribution. – hasanyasin Mar 08 '13 at 04:20
  • 1
    Thank you very much, I'd like to upvote this more than once :) – ArtoAle Mar 08 '13 at 16:46
  • 3
    @hasanyasin This is the nicest explanation on node that I found so far! :) – Venemo May 10 '13 at 11:57
  • 7
    @Mark Generally, if it's that computationally expensive, there are options/modules for tread/process workers... In general for these types of things, I use a Message Queue, and have worker process(es) that handles a task at a time from the queue, and work that task. This also allows for scaling to multiple servers. Along these lines, Substack has a lot of modules directed at provisioning and scaling you can look at. – Tracker1 May 22 '13 at 23:33
  • @Mark that would be the guy. – Tracker1 May 23 '13 at 00:20
  • 1
    Though he has a huge number of repositories, I would say that optimist, dnode, seaport and fleet are probably the most interesting. – Tracker1 May 23 '13 at 00:31
  • 2
    Indeed an amazing answer, but I don't think it really answers the question. The closest you came was in your answer to @Mark, but it really shows Node.js's limitations: to achieve concurrency you need to offload that work to C++ or to external services via message-passing. BTW, you want concurrency not only for CPU-bound work, but also for maintaining connection pools to databases. In Node.js this is indeed handled by database drivers (C++). The bottom line is that Node.js is not the best tool for concurrent requirements ... unless you're willing to "drop down" to C++ for the heavy lifting. – Tal Liron Oct 28 '13 at 10:57
  • @TalLiron Can you explain? I just don't get it. Why I can't create a concurrent child node process with child_process.fork()? And if I can, why do you call it "offload work to C++ or external service"? – Vitalii Lebediev Nov 04 '13 at 19:36
  • 1
    Oh. I just love re-reading this answer once in a while. So good! 1+ – Schoening Nov 12 '13 at 17:59
  • 2
    The OP and selected answerer of this question are, in my opinion, the ***perfect*** example of how the SO community should function - constructive questions, *extremely* constructive answers, and a human side of programming which is really just nice to see in our profession. If I could upvote this question and answer 10000 times I would... Everyone should see it. I don't even have a reason to try to multithread, but I still enjoyed reading this **cough** *thread* ;) – Chris Cirefice Dec 06 '13 at 17:19
  • Good question and *FANTASTIC* answer - probably one of the best answers I've ever read on SO. – Aerik Dec 16 '14 at 20:04
  • @hasanyasin --- the answer makes for good reading (hence, why an author wanted to use your answer) ... but as Mark pointed out, long-running processes is a perfectly valid use case that makes your rant meaningless – U Avalos Oct 06 '15 at 15:28
  • 1
    @UAvalos — I only tried to explain what was asked and I tried to make it as clear as possible since I have seen so many people misunderstand these basics. I am not a Node-only developer nor I am a fanatic; although I liked the toolchain and enjoyed building some services on Node that worked for years without going down for once. *I don't think I was ranting about anything.* I have done what I could to explain something and I don't think it is meaningless. You should write something more meaningful than calling others' meaningless. Then you will enjoy your life more, believe me. – hasanyasin Oct 07 '15 at 06:05
  • Nice and well written answer, although clearly missing in some aspects. It does not address the issues of CPU bound processes, but for those aspects one can look to the answer by `rsp`. I wish more answers were written in this dialogue style. – oligofren Nov 02 '15 at 11:38
  • When using Node you are creating threads because Node's way is multithreaded. The only thing that lets Node apps be single threaded is the fact Node itself uses multiple threads. There's no other way it can do what it does. – 0x1mason Dec 17 '15 at 23:58
  • regarding splitting long running task for making thread IO responsive, Use `setImmediate` not `process.nextTick`. Because `process.nextTick` schedules your task immediately after current callback but before IO callbacks. This will cause delays in processing IO calls on eventloop. – Anil Tallam Feb 01 '17 at 08:44
  • Comparing async and multitasking much like comparing soft and sweet. It is completely different things. And shame on Node.js that it still does not provide any standard way (i mean webworkers or at least some other core module) to deal with CPU-bound tasks. – evg656e Mar 01 '17 at 13:44
  • 1
    I think [this](http://stackoverflow.com/a/29088224/1365918) answer below is much better than the accepted answer. – kapad May 19 '17 at 08:33
  • hasanyasin: I am making a free e-magazine for Javascript to populate it in my company and friend cycles .and I've found your answer , which is very excise and useful . So ,may I translate it into Chinese and put it in my e-magazine ,please ? Looking forward to your reply.Thank you . reco 2018-08-01 – reco Aug 01 '18 at 04:58
  • @reco Thank you for the kind words and your nice thoughts. Please feel free to do whatever you want. Just make sure that the content is still accurate as it has been several years and I haven't been following the topic. I am also not sure about StackOverflow's terms in this regard if they require citation or not. – hasanyasin Aug 02 '18 at 17:48
  • @hasanyasin ,I am very happy to get your agreement .Yes ,sure,I will do my best for accurate to your origin document .thank you for your great answer. BTW, the e-magazine is in processing ,I've already published it at https://legacy.gitbook.com/book/1000copy/javascriptmagazine/details . – reco Aug 03 '18 at 02:05
35

(Update 2016: Web workers are going into io.js - a Node.js fork Node.js v7 - see below.)

(Update 2017: Web workers are not going into Node.js v7 or v8 - see below.)

(Update 2018: Web workers are going into Node.js Node v10.5.0 - see below.)

Some clarification

Having read the answers above I would like to point out that there is nothing in web workers that is against the philosophy of JavaScript in general and Node in particular regarding concurrency. (If there was, it wouldn't be even discussed by the WHATWG, much less implemented in the browsers).

You can think of a web worker as a lightweight microservice that is accessed asynchronously. No state is shared. No locking problems exist. There is no blocking. There is no synchronization needed. Just like when you use a RESTful service from your Node program you don't worry that it is now "multithreaded" because the RESTful service is not in the same thread as your own event loop. It's just a separate service that you access asynchronously and that is what matters.

The same is with web workers. It's just an API to communicate with code that runs in a completely separate context and whether it is in different thread, different process, different cgroup, zone, container or different machine is completely irrelevant, because of a strictly asynchronous, non-blocking API, with all data passed by value.

As a matter of fact web workers are conceptually a perfect fit for Node which - as many people are not aware of - incidentally uses threads quite heavily, and in fact "everything runs in parallel except your code" - see:

But the web workers don't even need to be implemented using threads. You could use processes, green threads, or even RESTful services in the cloud - as long as the web worker API is used. The whole beauty of the message passing API with call by value semantics is that the underlying implementation is pretty much irrelevant, as the details of the concurrency model will not get exposed.

A single-threaded event loop is perfect for I/O-bound operations. It doesn't work that well for CPU-bound operations, especially long running ones. For that we need to spawn more processes or use threads. Managing child processes and the inter-process communication in a portable way can be quite difficult and it is often seen as an overkill for simple tasks, while using threads means dealing with locks and synchronization issues that are very difficult to do right.

What is often recommended is to divide long-running CPU-bound operations into smaller tasks (something like the example in the "Original answer" section of my answer to Speed up setInterval) but it is not always practical and it doesn't use more than one CPU core.

I'm writing it to clarify the comments that were basically saying that web workers were created for browsers, not servers (forgetting that it can be said about pretty much everything in JavaScript).

Node modules

There are few modules that are supposed to add Web Workers to Node:

I haven't used any of them but I have two quick observations that may be relevant: as of March 2015, node-webworker was last updated 4 years ago and node-webworker-threads was last updated a month ago. Also I see in the example of node-webworker-threads usage that you can use a function instead of a file name as an argument to the Worker constructor which seems that may cause subtle problems if it is implemented using threads that share memory (unless the functions is used only for its .toString() method and is otherwise compiled in a different environment, in which case it may be fine - I have to look more deeply into it, just sharing my observations here).

If there is any other relevant project that implements web workers API in Node, please leave a comment.

Update 1

I didn't know it yet at the time of writing but incidentally one day before I wrote this answer Web Workers were added to io.js.

(io.js is a fork of Node.js - see: Why io.js decided to fork Node.js, an InfoWorld interview with Mikeal Rogers, for more info.)

Not only does it prove the point that there is nothing in web workers that is against the philosophy of JavaScript in general and Node in particular regarding concurrency, but it may result in web workers being a first class citizen in server-side JavaScript like io.js (and possibly Node.js in the future) just as it already is in client-side JavaScript in all modern browsers.

Update 2

In Update 1 and my tweet I was referring to io.js pull request #1159 which now redirects to Node PR #1159 that was closed on Jul 8 and replaced with Node PR #2133 - which is still open. There is some discussion taking place under those pull requests that may provide some more up to date info on the status of Web workers in io.js/Node.js.

Update 3

Latest info - thanks to NiCk Newman for posting it in the comments: There is the workers: initial implementation commit by Petka Antonov from Sep 6, 2015 that can be downloaded and tried out in this tree. See comments by NiCk Newman for details.

Update 4

As of May 2016 the last comments on the still open PR #2133 - workers: initial implementation were 3 months old. On May 30 Matheus Moreira asked me to post an update to this answer in the comments below and he asked for the current status of this feature in the PR comments.

The first answers in the PR discussion were skeptical but later Ben Noordhuis wrote that "Getting this merged in one shape or another is on my todo list for v7".

All other comments seemed to second that and as of July 2016 it seems that Web Workers should be available in the next version of Node, version 7.0 that is planned to be released on October 2016 (not necessarily in the form of this exact PR).

Thanks to Matheus Moreira for pointing it out in the comments and reviving the discussion on GitHub.

Update 5

As of July 2016 there are few modules on npm that were not available before - for a complete list of relevant modules, search npm for workers, web workers, etc. If anything in particular does or doesn't work for you, please post a comment.

Update 6

As of January 2017 it is unlikely that web workers will get merged into Node.js.

The pull request #2133 workers: initial implementation by Petka Antonov from July 8, 2015 was finally closed by Ben Noordhuis on December 11, 2016 who commented that "multi-threading support adds too many new failure modes for not enough benefit" and "we can also accomplish that using more traditional means like shared memory and more efficient serialization."

For more information see the comments to the PR 2133 on GitHub.

Thanks again to Matheus Moreira for pointing it out in the comments.

Update 6

I'm happy to announce that few days ago, in June 2018 web workers appeared in Node v10.5.0 as an experimental feature activated with the --experimental-worker flag.

For more info, see:

Finally! I can make the 7th update to my 3 year old Stack Overflow answer where I argue that threading a la web workers is not against Node philosophy, only this time saying that we finally got it!

rsp
  • 107,747
  • 29
  • 201
  • 177
  • Hmm.. @rsp I cannot seem to find the module package for web workers for `iojs`, any idea? – NiCk Newman Sep 05 '15 at 15:18
  • 1
    @NiCkNewman Thanks. I see that the original pull request in io.js is closed now and replaced with another one - with some discussion there in the pull requests comments on GitHub, maybe you'll be able to find some info there. See: Update 2 in my answer. – rsp Sep 06 '15 at 08:28
  • 1
    Yep, it looks like they just fixed the last libuv issue. I wonder when I can get my hands on the module. Cannot wait! Thanks for keeping us updated ~ Edit: Just got initialized: https://github.com/petkaantonov/io.js/commit/ea143f72fc6845668716bd7a5da22771011defe4 There we go, it's coming! – NiCk Newman Sep 06 '15 at 14:26
  • 1
    Yep, it's live. (Not officially implemented yet) but you can download the source here: https://github.com/petkaantonov/io.js/tree/ea143f72fc6845668716bd7a5da22771011defe4 and compile if you want to test it out! I'm doing it now ~ – NiCk Newman Sep 06 '15 at 14:51
  • 1
    @NiCkNewman Thanks for the new info - I added it to the answer. – rsp Sep 07 '15 at 07:54
  • 1
    Can you please update us on the status of the Node.js `workers` implementation? Latest comments in [PR #2133](https://github.com/nodejs/node/pull/2133) are from February; the developers apparently ran into a problem and there are no comments indicating it's been solved. – Matheus Moreira May 30 '16 at 05:05
  • 1
    @MatheusMoreira Thanks for your comment and especially thanks for reviving the discussion on GitHub. It seems that Web Workers will be in Node v7 - I updated my answer, if anything needs correcting please let me know. Thanks. – rsp Jul 25 '16 at 08:42
  • 1
    PR #2133 was closed yesterday. Seems like Node.js will not have this feature. – Matheus Moreira Dec 13 '16 at 01:27
  • @MatheusMoreira Thanks for the info. I updated the answer again. – rsp Jan 12 '17 at 22:48
  • There is new PR for that: https://github.com/nodejs/node/pull/20876 Also, see: https://github.com/alibaba/AliOS-nodejs/wiki/Workers-in-Node.js-based-on-multithreaded-V8 – evg656e May 24 '18 at 16:17
8

I come from the old school of thought where we used multi-threading to make software fast. For past 3 years i have been using Node.js and a big supporter of it. As hasanyasin explained in detail how node works and the concept of asyncrous functionality. But let me add few things here.

Back in the old days with single cores and lower clock speeds we tried various ways to make software work fast and parallel. in DOS days we use to run one program at a time. Than in windows we started running multiple applications (processes) together. Concepts like preemptive and non-preemptive (or cooperative) where tested. we know now that preemptive was the answer for better multi-processing task on single core computers. Along came the concepts of processes/tasks and context switching. Than the concept of thread to further reduce the burden of process context switching. Thread where coined as light weight alternative to spawning new processes.

So like it or not signal thread or not multi-core or single core your processes will be preempted and time sliced by the OS.

Nodejs is a single process and provides async mechanism. Here jobs are dispatched to under lying OS to perform tasks while we waiting in an event loop for the task to finish. Once we get a green signal from OS we perform what ever we need to do. Now in a way this is cooperative/non-preemptive multi-tasking, so we should never block the event loop for a very long period of time other wise we will degrade our application very fast.
So if there is ever a task that is blocking in nature or is very time consuming we will have to branch it out to the preemptive world of OS and threads. there are good examples of this is in the libuv documentation. Also if you read the documentation further you find that FileI/O is handled in threads in node.js.

So Firstly its all in the design of our software. Secondly Context switching is always happening no matter what they tell you. Thread are there and still there for a reason, the reason is they are faster to switch in between then processes.

Under hood in node.js its all c++ and threads. And node provides c++ way to extend its functionality and to further speed out by using threads where they are a must i.e., blocking tasks such as reading from a source writing to a source, large data analysis so on so forth.

I know hasanyasin answer is the accepted one but for me threads will exist no matter what you say or how you hide them behind scripts, secondly no one just breaks things in to threads just for speed it is mostly done for blocking tasks. And threads are in the back bone of Node.js so before completely bashing multi-threading is in correct. Also threads are different from processes and the limitation of having node processes per core don't exactly apply to number of threads, threads are like sub tasks to a process. in fact threads won;t show up in your windows task manager or linux top command. once again they are more little weight then processes

limplash
  • 81
  • 1
  • 3
  • Asynchronous code is not some huge innovation (in fact we've had it for decades) and multithreading is not some deprecated technology to be replaced. They're different tools with different tradeoffs, and in fact they can even be combined quite well. Everytime you run node-cluster, you in fact run multiple "threads" (processes in this case, but the same could be achieved with threads, and be even more lightweight). Or take Erlang or Go, which can run thousands of green threads... – Hejazzman Aug 10 '16 at 13:12
  • I think the major point that we are missing is that process under the OS will always be done in a preemptive manner to provide fairness. Also with multi processors you can have actual parallel code execution but even then you will have preemption. Asynchronous work is also carried out by the OS in some for of a process. – limplash May 15 '17 at 07:25
4

I'm not sure if webworkers are relevant in this case, they are client-side tech (run in the browser), while node.js runs on the server. Fibers, as far as I understand, are also blocking, i.e. they are voluntary multitasking, so you could use them, but should manage context switches yourself via yield. Threads might be actually what you need, but I don't know how mature they are in node.js.

lanzz
  • 42,060
  • 10
  • 89
  • 98
  • 3
    just for your info, webworkers have been (partially) adapted on node.js. And are available as `node-workers` package. Have a look at this: https://github.com/cramforce/node-worker – Parth Thakkar May 27 '12 at 11:28
  • Good to know, thanks. Docs are very scarce though, I have no idea whether it runs in a separate thread, process, or simply runs in the same process, and I don't have the time to dig into the code, so I have no idea if it will work for your case. – lanzz May 27 '12 at 11:30
  • @ParthThakkar: That project hasn't been touched in 3 years (2 when you posted), and hasn't made it past 0.0.1. – mpen Mar 07 '13 at 20:51
  • @Mark: The reason for my ignorance on that is that I am not a professional programmer yet. Heck, I am not even in a university. I am still a High School fellow, who keeps reading about programming - besides managing the school work. So, it isn't remotely possible for me to have knowledge about all such issues. I just posted what i knew... – Parth Thakkar Mar 10 '13 at 05:25
  • @Mark: Although it was nice of you to point out that about the history of the project. Such things will be taken care of in my future responses!! :) – Parth Thakkar Mar 10 '13 at 05:26
  • @ParthThakkar: I'm not blaming you, I'm just saying that others should be weary of using such an immature project. The only other threading library I'm aware of is "threads a gogo" which hasn't received much more activity. When picking libraries, I like to make sure they're actively developed, otherwise they are likely to be incomplete, and if there are any bugs, they likely won't be resolved. Not to mention they're probably also behind on times. – mpen Mar 10 '13 at 07:11
  • @Mark, I get your point :). And as I said, such issues will be taken care of in my future answers. Thanks again for bringing to my notice this thing! – Parth Thakkar Mar 13 '13 at 15:03
  • @Ianzz, you may want to look at node-worker-threads, which runs in a separate thread specifically.. node-workers iirc used separate processes, and communicated through a unix socket to/from the workers encoding messages with msgpack, while node-worker-threads uses IPC copying which is likely faster, and has more options than the worker pattern. – Tracker1 May 22 '13 at 23:36
3

worker_threads has been implemented and shipped behind a flag in node@10.5.0. It's still an initial implementation and more efforts are needed to make it more efficient in future releases. Worth giving it a try in latest node.

motss
  • 662
  • 4
  • 6
2

In many Node developers' opinions one of the best parts of Node is actually its single-threaded nature. Threads introduce a whole slew of difficulties with shared resources that Node completely avoids by doing nothing but non-blocking IO.

That's not to say that Node is limited to a single thread. It's just that the method for getting threaded concurrency is different from what you're looking for. The standard way to deal with threads is with the cluster module that comes standard with Node itself. It's a simpler approach to threads than manually dealing with them in your code.

For dealing with asynchronous programming in your code (as in, avoiding nested callback pyramids), the [Future] component in the Fibers library is a decent choice. I would also suggest you check out Asyncblock which is based on Fibers. Fibers are nice because they allow you to hide callback by duplicating the stack and then jumping between stacks on a single-thread as they're needed. Saves you the hassle of real threads while giving you the benefits. The downside is that stack traces can get a bit weird when using Fibers, but they aren't too bad.

If you don't need to worry about async stuff and are more just interested in doing a lot of processing without blocking, a simple call to process.nextTick(callback) every once in a while is all you need.

genericdave
  • 369
  • 2
  • 5
  • well, your suggestion - about clusters - was what i initially thought about. But the problem with that is their overhead - a new instance of v8 has to be initialised every time a new process is forked (~30ms, 10MB). So, you can't create lots of them. This is taken directly from the node docs: _These child Nodes_ (about child_processes) _are still whole new instances of V8. Assume at least 30ms startup and 10mb memory for each new Node. That is, you cannot create many thousands of them._ – Parth Thakkar May 29 '12 at 14:15
  • 1
    This is exactly the idea of cluster. You run one worker per cpu core. Any more is most likely unnecessary. Even cpu intensive tasks will work fine with an asynchronous style. However, if you *really* need full-blown threads, you should probably consider moving to another server backend entirely. – genericdave May 29 '12 at 23:18
1

Maybe some more information on what tasks you are performing would help. Why would you need to (as you mentioned in your comment to genericdave's answer) need to create many thousands of them? The usual way of doing this sort of thing in Node is to start up a worker process (using fork or some other method) which always runs and can be communicated to using messages. In other words, don't start up a new worker each time you need to perform whatever task it is you're doing, but simply send a message to the already running worker and get a response when it's done. Honestly, I can't see that starting up many thousands of actual threads would be very efficient either, you are still limited by you CPUs.

Now, after saying all of that, I have been doing a lot of work with Hook.io lately which seems to work very well for this sort of off-loading tasks into other processes, maybe it can accomplish what you need.

kbjr
  • 1,254
  • 2
  • 10
  • 22