1

The NodeJs docs imply there are no asynchronous system APIs that it could use to do file system operations and so asynchronous behavior is spoofed using a thread pool. I find it hard to believe that modern operating systems do not provide asynchronous system APIs for file system operations. Is that true? How can that possibly be the case?

Asynchronous system APIs are used by Node.js whenever possible, but where they do not exist, libuv's threadpool is used to create asynchronous node APIs based on synchronous system APIs. Node.js APIs that use the threadpool are:

all fs APIs, other than the file watcher APIs and those that are explicitly synchronous


Looks like they might be working on it: Is there really no asynchronous block I/O on Linux?

Interface is kernel backed and DOESN'T use a userspace thread pool

So maybe it not available for linux. So maybe it's just easier to use a threadpool until that functionality is available across all NodeJS platforms. I guess I should have done more research. Just seems odd for the docs not to say they use Asynchronous disk IO when available and a threadpool otherwise...

Community
  • 1
  • 1
Christopher King
  • 1,034
  • 1
  • 8
  • 21
  • Why would the OS offer async access? I suppose it could but I don't really see the value - if you want to read `file.txt` you want to do it *now* not after an indeterminate amount of time when the file may or may not have changed and may or may not actually exist any more. – VLAZ Apr 03 '20 at 07:24
  • Sure, but that's kinda the point, no? "Now" is a really long time from the CPUs perspective so why block the thread? Why not ask the OS to lock the file, read the file, and call back when the disk has done its thing. – Christopher King Apr 03 '20 at 07:37
  • What happens with delete requests, then? They never get done? Or they only get done after the file is no longer needed? What if you *really need space* - you issue a delete, it "succeeds" (in that it doesn't fail) and then what? You're still limited on space. But you *deleted the file*, right? And what happens with writes? I'm not saying it cannot be done but it's a lot of things to manage, I'm not sure what value *to the OS* this asynchronicity brings - you have all this extra maintenance neede but why would the OS makers want to take on this work? – VLAZ Apr 03 '20 at 07:41
  • Doesn't NodeJS expose asynchronous APIs for all these cases already? Why could the OS provide the same abstraction? I ask it to delete a file. It calls back when the file is deleted or there is an error -- maybe it's locked or already deleted. – Christopher King Apr 03 '20 at 07:43
  • But that doesn't answer why the OS makers would want to do that. You're not an OS maker, so what *you* would do is irrelevant. – VLAZ Apr 03 '20 at 07:44
  • I think Windows has asynchronous disk IO, right? So it's not like it can't be done. Isn't that what overlapped disk IO is all about? https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped_entry – Christopher King Apr 03 '20 at 07:49
  • That would be related to the I/O buffering done. What you're talking about is not buffering but an entirely different layer before buffering. Or actually, is it *after*? You should be able to see how the complexity of this suddenly jumped here. And again, I'm not saying it cannot be done - it clearly *can*. But why would the OS want to provide it? What's the benefit and to whom? – VLAZ Apr 03 '20 at 07:52
  • @VLAZ An OS would provide such an interface for those programs that wish to maximise the number of operations they can have in flight with the minimum possible overhead despite the additional complexity (and lack of portablitiy) that might involve for the programmer. – Anon May 03 '20 at 20:20

1 Answers1

0

I find it hard to believe that modern operating systems do not provide asynchronous system APIs for file system operations. Is that true?

(I know some will argue Linux isn't a modern OS but let's take your comment in the spirit it was asked in :-)

In the past it was half true... Few things are perfect and this extends to operating systems! It's very possible for something to be useful and/or popular without having every possible feature in the best form. For a long time from a kernel perspective Linux only had mmap, epoll and Linux AIO (some of which are seen in another Stack Overflow answer) those attacked some parts of the issue but came with drawbacks. As you noted these days the answer has become: Linux does provide an asynchronous system API for file system operations via io_uring (which is a more holistic approach compared to what Linux had before). I can't speak for other OSes.

How can that possibly be the case?

Sometimes it takes a long time for an API to be created that will be acceptable. Maybe the problem is awkward, maybe it's not enough of priority, maybe it's hard to make something that fits and "tastes" right, maybe you need a particular person to tackle it etc. LWN records Linux async I/O proposals that stretch across a 15 year period so people have being trying for some time!

Finally, libuv has an open pull request to add io_uring support to it but it keeps stalling (likely because these problems require a large amount of concerted effort to solve). Further, the person who wrote the Windows libuv async pieces notes the following as part of a longer comment:

Yes, it's true that many APIs would theoretically allow kernel-level asynchronous I/O, but in practice the story is not so rosy.

Maybe retrofitting async interfaces to an existing project is just difficult work? Coming back to the question in the title:

Are there no asynchronous APIs [Node.js] could use?

Depends on the OS (so I guess the answer is "sometimes" or "it depends") and even when there is such an API someone has to write that support into Node.js itself!

Anon
  • 6,306
  • 2
  • 38
  • 56