1

I was trying to understand how nodejs can achieve higher concurrency compared to thread-based approaches such as Servlet servers.

I already know that in nodejs "everything runs in parallel except your code", and also there is a backend thread pool in libuv to handle File IO or database calls which are usually the bottlenecks.

So here is my question: if nodejs uses thread pool to handle database calls, how it can service higher concurrent request than Servlet servers such as Tomcat given that Tomcat can also use NIO backed by epoll/kqueue to achieve high concurrency ?

For example, if there's a 100k concurrent request coming in and each requires database operations, if these 100k request are to be serviced concurrently, with nodejs we still end up creating 100k threads which might cause memory exhaustion as Tomcat does. Yes, the 100k threads is just an imagination because (I know) that nodejs has a fixed thread pool and different operations are queued in the event loop, but with Tomcat it handles things in the same way--we also can configure the thread pool size in Tomcat and it also queues request.

Or, am I wrong to say that "nodejs uses backend thread pool in libuv to handle File IO or database calls"? Does nodejs use epoll/kqueue to handle database io without a separate thread?

I was reading this similar question but still didn't get the answer.

davenkin
  • 131
  • 2
  • 8

1 Answers1

2

if nodejs uses thread pool to handle database calls

That's a wrong assumption. nodejs will typically use networking to talk to a local database running in a different process or on a different host. Networking in node.js does not use threads of any kind - it uses event driven I/O. What the database does for threads is up to the database and independent of node.js since it would be the same no matter which server environment you were using.

node.js does use a thread pool for local disk access, but high scale applications are usually using a database for the crux of their disk access which run in a separate process and have their own I/O optimizations to handle lots of requests. How a given database does it is up to that implementation, but it will not be using a nodejs thread per request.

I was trying to understand how nodejs can achieve higher concurrency compared to thread-based approaches such as Servlet servers.

The general concept is that a properly written server app in node.js uses async I/O for all I/O (except perhaps startup code that only runs during server startup). This means that it can have a lot of requests in-flight at the same time with only a single Javascript thread while most of them are waiting on some type of I/O. If you're going to have a lot of requests in-flight at the same time, it can be a lot more efficient for the system to do it the node.js way of a single thread where all the requests are cooperatively switched vs. using OS threads where every thread has OS overhead associated with it and every pre-emptive thread switch has OS and CPU overhead associated with it.

In node-js, there is no pre-emptive switching between the active requests. Only one runs at a time and it runs until it either finishes or hits an asychronous operation and has nothing else to do until that async I/O operation completes. At that point, the JS engine goes back to the event queue and picks out an event (probably for one of the other requests). This type of cooperate switching can be significantly faster and more efficient than OS-level threads. There is sometimes a programming cost in that a node.js developer has to code with async I/O in order to take advantage of this which has a learning curve in order to get proficient at writing good, clean code with proper error handling and has a learning curve for debugging it too.

For example, if there's a 100k concurrent request coming in and each requires database operations, if these 100k request are to be serviced concurrently, with nodejs we still end up creating 100k threads which might cause memory exhaustion as Tomcat does.

No, you will not be creating 100k threads. A node.js database interface layer that interfaces between node.js and the actual database code in another process or on another host may be written entirely in node.js (using TCP networking to talk to the database) and introduce no new threads at all or it may have some native code and use a small number of threads for its own native code operations, but it will likely be a small number of threads and nothing even close to one per request.

Or, am I wrong to say that "nodejs uses backend thread pool in libuv to handle File IO or database calls"? Does nodejs use epoll/kqueue to handle database io without a separate thread?

For file I/O, yes it uses a thread pool in libuv. For database calls, no - While the details depend entirely upon the database implementation, usually there is not a thread per database call. The database is typically in another process and the nodejs interface library for the DB either directly uses nodejs TCP to talk to the database (which uses no threads) or it has its own native code add-on that talks to the database which probably uses a small number of threads for its work, but typically not a thread per request.

jfriend00
  • 683,504
  • 96
  • 985
  • 979
  • thanks for the comprehensive explanation. When I talked about database call I mainly refer to the call from the client side, in which case our js code serves as the DB client. Yes the database call eventually is network socket io, so can I say that under the hood database call from the client side is covered by the underlying io mechanisms like epoll/kqueue wrapped by libuv? otherwise I can't imagine a scenario where thread is not being used while the DB call is asynchronous. – davenkin Jul 16 '17 at 09:56
  • @davenkin - Your database comments are confusing me. Typically the actual code that runs a database will be in a separate process from node.js (and sometimes on another host). When Javascript in node.js makes a client call to the database, it will prepare some communication to that other process or other host. That communication will typically NOT use a node.js thread as it will typically be non-blocking TCP and node.js does not use an additional thread for each TCP connection. – jfriend00 Jul 16 '17 at 15:43
  • @davenkin - Just to give you an idea. If I write a node.js app to make http API calls to 100 different hosts, such that all 100 requests are in-flight at the same time and waiting for responses, that does not use any additional threads in node.js. Now if a database driver in node.js has it's own native code, then it can do anything it wants within its own native code. Some may use some native threads to keep track of requests - I don't know. But, that is not required by the node.js model at all. – jfriend00 Jul 16 '17 at 15:45