0

As I understand, one of the benefits of NodeJS is that it's one thread per process; in the standard case, you don't need to worry about concurrency.

I also read about NodeJS scaling on multi core machines (Node.js on multi-core machines):

Workers will compete to accept new connections, and the least loaded process is most likely to win. It works pretty well and can scale up throughput quite well on a multi-core box.

In this case, will multiple threads execute in parallel? If so, doesn't that mean we do have to write multithreaded code (if we want to use multiple cores) - and if so, how do I do that?

Or if they don't execute in parallel... where is the boost/benefit of multiple cores coming from?


Edit: My current understanding

So there may be multiple processes on multiple cores but each process only has a single thread.

For example:

var io = require('socket.io').listen(81);

var connections = [];

io.sockets.on('connect', function (socket) {
    console.log('connected...');
    connections.push(socket);

    socket.on('disconnect', function () {
        console.log('disconnected');
        connections.remove(socket);
    });
});

There aren't race connections; there's a single thread, there won't be concurrent accesses of connections. When you have different processes, each process has its own copy of connections. So if you had a massive chatroom you couldn't balance the load with multiple processes; each process would be its own chatroom.

In this aspect, it's not any different from PHP, in that each PHP script has its own copy of the variables so you don't write locking code. Of course, the rest of it is completely different, but as far as I can see the argument "you don't have to write thread-locking code" isn't much of a plus because most data will be saved elsewhere anyways (not as in-memory variables).

Community
  • 1
  • 1
Raekye
  • 5,081
  • 8
  • 49
  • 74
  • Per your edit, yes you are correct about processes, connections, etc. However your example of a chatroom is incorrect, as you can balance the work across multiple connections, using either node-cluster module, or raw IPC via the childprocess module. – Alan Jul 17 '13 at 19:38
  • @Alan hmmm, so with clusters I can have a variable/resource (e.g. `connections`) shared, and only one process will access it at a time? Sorry, it might take another few months for it to sink in :P – Raekye Jul 17 '13 at 21:40
  • No, not shared resources. Clusters allow you to share server ports with your worker node processes. However if you wanted to implement a massive chat system, it would be trivial with Clusters. Server spawns workers. Each worker can handle N clients. If any client sends a message, the worker that client is attached to, sends that message to the server, which in turn sends the message back to all workers, which then sends the message to each client. – Alan Jul 18 '13 at 04:44

3 Answers3

2

The answer to:

Does nodejs carry the “single thread” (no multithreaded locking code) benefit when run on multiple cores?

Is yes, node still prevents locking code, as each process is still single threaded.

There are no multi-threads in node (javascript is designed to be a single thread). Scaling to multi-cores involves multiple processes, each with a single thread.

So, you have multiple process that execute in parallel, but since they're separate processes, with their own process space, you don't have the same issues with locks as you would with a multi-threaded process. Communicating between processes uses IPC via handles. Since all IO is non-blocking in Node, while child processes are waiting for I/O other processes can continue to execute, and receive data.

Alan
  • 45,915
  • 17
  • 113
  • 134
  • Okay, but how does NodeJS figure out when a process can be pulled off the processor? With one thread/process, I don't need to worry about concurrent access to some data. (see question update for example) – Raekye May 24 '13 at 04:38
  • Your update to the question doesn't really pertain to node. What your asking about is about parallel computing, to which Nodejs supports (via multiprocesses). Said anotherway: the solution to parallel processing in another language, like C++, architecturally speaking, is the same solution you would apply to NodeJS apps. – Alan May 24 '13 at 04:49
  • Well my question is: when NodeJS is run on multiple processes, does it still have that "don't worry about concurrency" issue? And I'm guessing the answer is yes then? And then how do I apply the same solution to NodeJS? It doesn't (natively) support locks, AFAIK – Raekye May 24 '13 at 04:52
  • Okay, I guessed wrong :P Node prevents blocking code, how do I deal with concurrent io then? E.g. I still have to use transactions and `select for update`? – Raekye May 24 '13 at 04:53
  • Concurrent IO: You either use a datastore that enforces this, or you create an architecture which you have a master/controller which is responsible for marshalling data to the child processes, as well as collecting writing it back to the data store. Neither of these, requires multi-threads. – Alan May 24 '13 at 04:54
  • Where can I read more about an `architecture which you have a master/controller which is responsible for marshalling data to the child processes, as well as collecting writing it back to the data store` (is there a formal name I can google for?) – Raekye May 24 '13 at 04:56
  • @Raekye you're probably aware already but you can achieve transaction-like support in MongoDB using a two-phase commit: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/ – Richard Marr May 24 '13 at 08:32
  • Alright, I've done some reading. I've got sort of a different question now - which is a concrete benefit of Node's single thread structure? I hear "you don't need to write multithreaded code, and multithreaded code is hard." Yet, to me most race conditions involve accessing data - and you still have race conditions with callbacks or on a multi core/process setup. I hope I'm not splitting hairs. Can you give me an example of when say a C or Java program would need special code but NodeJS doesn't? Beyond accessing a simple in-script variable? – Raekye May 25 '13 at 05:04
  • And @RichardMarr yea I did see that, although I didn't totally understand. Thanks for bringing it up, because I wasn't sure if it really was what I thought it to be – Raekye May 25 '13 at 05:05
  • Extended discussions should not be made in the comment section of Stack Overflow. You're better off asking this on Programmers.stackexchange.com – Alan May 25 '13 at 06:43
  • I know this is much later, but I think I finally get it. I updated my question to reflect my current understanding, could you confirm it? – Raekye Jul 17 '13 at 18:49
2

As the nature of javascript is, running code can only be executed in a single Thread. That means in every internal resource of the running Node each resource is accessible by only one running function, parallelism can't happen. An example:

var car = {
    velocity: 100,
};

function speedUpTo150() {
    car.velocity = 150;
}

function slowDownTo80() {
    car.velocity = 80;
}

speedUpTo150();
slowDownTo80();
setTimeout(function() {
    speedUpTo150();
},1000);

setTimeout(function() {
    slowDownTo80();
},1000);

By this example it should be clear that race condition can not happen as on any time access to car can only have one function.

Yet nodejs as you mentioned can have a multicore execution mode. This can happen either by clustering (forking) the Javascript code into various nodeJS processes, or by spawing child Processes. Again in each individual processes (either cluster or child processes) race condition can not happen in their internal resources. Neither can happen as they exchange resources, as at any time at both sides only one piece of code is executed and apply the exchange.

But you also mentioned external resources, such as MongoDB. NodeJS can not be agnostic of what MongoDB is serving at any time rather than its own calls. So in that case race condition (I am not completely sure how mongoDB serves this case, it's just a hypothesis) can happen as at any time MongoDB can serve any process, either that second process is a fork instance of NodeJS or any other. In such cases you should implement a locking mechanism.

You Should note that same case is also applied to the Actor pattern where each actor is an individual thread and have a really similar way to handle race condition to its thread internal resources. But when it comes to external resources by Actor's nature it is not possible to be aware of external resource's state.

Just food for thought, why don't you check for an immutable mechanism?

Cheers!

Community
  • 1
  • 1
Evan P
  • 1,767
  • 1
  • 20
  • 37
  • Thanks for the concrete example, good to confirm. My sort of 'confusion' now is... how much further beyond accessing in-script variables does NodeJS simplify/eliminate multithreaded code? For example, it doesn't solve race conditions in any cases with async IO – Raekye May 25 '13 at 05:09
  • There will be no way (as far as JS engines are built so far) to have more than one executing queue accessing internal resources. Either in asynchronous manner. To clear this up you should read a really good article on how internal scheduling and executing queues work by John Resig http://ejohn.org/blog/how-javascript-timers-work . – Evan P May 25 '13 at 22:08
0

JavaScript always runs in a single thread. There is no such thing as multithreaded code in JavaScript. It is not good for heavy computing, but it's good for IO based operations, because it's event based, e.g. when IO access is in progress the thread is free to handle other requests/operation. That's why it can handle well many "simultaneous" connections.

Tomas Kirda
  • 8,347
  • 3
  • 31
  • 23
  • How can I handle concurrent-safe access of data? a NodeJS process can be pulled off the processor at any time right? – Raekye May 24 '13 at 04:41
  • This different concurrency that you are talking about. This is not thread concurrency. For this you need to decide what database concurrency strategy you want to use. What is the actual problem you are trying to resolve? – Tomas Kirda May 24 '13 at 04:53
  • I'm not actually building anything - just playing with Node and thinking about possible problems/solutions. Example: I need to select from a database, do complex computation, then save it. MongoDB has limitted support for locking and transactions. NodeJS doesn't let you write locking code; how do I deal with this? (Besides use PostGREsql or something that does support transactions and locking - surely there's some way to do it?) – Raekye May 24 '13 at 04:55
  • If this is an issue, use RDBMS that supports transactions and transaction manager will handle that for you. Database driver should have API that supports transactions. Otherwise you screwed, or you can write your own driver. – Tomas Kirda May 24 '13 at 05:09
  • 1
    This is wrong. [There absolutely *is* such a thing as multithreaded JS.](http://en.wikipedia.org/wiki/Web_worker). – josh3736 May 25 '13 at 22:37