What's point of mapLimit?

Question

What is point of function mapLimit in async lib? I thought that Node uses internal thread pool that limits number of async operations at a time. More over, we have single thread in Node, and it uses event loop (it means that we make one operation at a time in that thread). Can someone explain why do we need it?

score 2 · Answer 1 · answered Apr 16 '19 at 05:53

2

Actually, for network I/O node uses no thread pools. It uses exactly ONE thread: the main thread.

So yes, node cannot execute code in parallel.

But node can wait for things to happen in parallel. This is the nature of asynchronous/non-blocking I/O regardless weather you're using node.js or C++ or Java or go.

Without mapLimit node will make all the requests at once (if you're processing a thousand downloads then node will dutifully try to do that). This is not always desirable since some services have rate limits and also you will run into request timeout issues. Therefore mapLimit allows you to only wait for a limited amount of async operations in parallel.

answered Apr 16 '19 at 05:53

slebetman

109,858
19
140
171

It's worth mentioning again, node does not have parallel code execution (no threads unless you're using worker threads) but it does have parallel waits (made possible by callbacks) – slebetman Apr 16 '19 at 05:54
As usual with my answers like this, you may be interested in my answer to this other related question for details on how this all works: https://stackoverflow.com/questions/29883525/i-know-that-callback-function-runs-asynchronously-but-why/29885509#29885509 – slebetman Apr 16 '19 at 05:57
I got the point about rate limits and request timeout issues, but actually as mentioned here http://docs.libuv.org/en/v1.x/threadpool.html Node uses internal thread pool for file system operations. I am not sure about network, can you provide a proof that Node uses mechanism other than thread pool for handling block operations with network? – Guseyn Ismayylov Apr 16 '19 at 06:02
@GuseynIsmayylov Luckily someone else asked the same question last week so I still have the link. Nodejs documentation: https://nodejs.org/es/docs/guides/dont-block-the-event-loop/ scroll down to "A quick review of Node" section to see what code runs on the main thread and what run on separate threads. Took me a while to originally find it. My original answer is I read the node.js source code and you can do the same on github but if you believe the docs it's good enough for high level description – slebetman Apr 16 '19 at 06:11
@GuseynIsmayylov Also, threading is not necessary for disk I/O. There is s language, Tcl, that does async disk I/O on the main thread. But cross-platform compatibility is a nightmare (linux, BSD, Win9x, Win2k/NT/XP, Mac, MacOSX, Solaris etc. and node does not even support everything Tcl supports). The main reason for doing disk I/O in a separate thread is to avoid needing to maintain cross-platform async code (they all have different APIs unlike libpthread) – slebetman Apr 16 '19 at 06:14
I read this section https://nodejs.org/es/docs/guides/dont-block-the-event-loop/#what-code-runs-on-the-worker-pool I know that Node is single threaded and non-blocking. But still, it uses worker pool for heavy stuff. I understand these things. I think, I just need to experiment with mapLimit and such functions to get the point. – Guseyn Ismayylov Apr 16 '19 at 06:32
@GuseynIsmayylov Rather than thinking of it uses worker threads for "heavy stuff" it is more accurate to think of it as node uses worker threads only for 4 things (well, 5 if you include the `worker_threads` module itself). – slebetman Apr 16 '19 at 06:46

Guseyn Ismayylov · Accepted Answer · 2019-04-16T10:08:04.410

I finally found out the answer. It's all about our main thread in Node that processes stack of operations.

And the main problem that we want to avoid is RangeError: Maximum call stack size exceeded. And that's it.

Let's say we have a lot of async operations and they all return some results to the main thread(to the main stack) to be precessed after they've done their work.

And the main point is here we just need to limit somehow the number of them in our stack. How? Just limit number of async operations at a time, so we don't need to handle a lot operations in our main stack after async operations have done their work.

And it's important to understand that it does not guarantee success. Because even if we have a limited number of concurrent async operations, but our sync operations in the main stack are very slow, our stack eventually will blow up and we'll get RangeError: Maximum call stack size exceeded anyway.

What's point of mapLimit?

2 Answers2