3

I was just planning to rewrite some python code in which I was polling a file for changes. I wanted to rewrite it as an exercise for asyncio and the conceptual idea was to do a nonblocking file read that would yield. Once the data is available the event loop would continue the coroutine execution.

Then I discovered that async file operations isn't something one does. ref.

But I couldn't understand what is the motivation for this behavior and how why would it be any different than for sockets.

Socket example:

Reading a socket yields from a coroutine until the data is ready. Ready meaning it actually arrived in an nondeterministic time from somewhere on the Internet.

Why not also for reading a file:

Reading a file yields from a coroutine until the data is ready. Ready meaning it actually arrived in an nondeterministic time from somewhere from the hard disk of the computer

  • Is this an inherited behavior from legacy code that works well enough with blocking calls?

  • Does it something to do with Character vs Block files?

  • What about character device files, say a file representing a UART connection? Would the no file IO also be applied here?

TheMeaningfulEngineer
  • 15,679
  • 27
  • 85
  • 143

1 Answers1

1

Definitely not a full answer, but some thoughts that are too large for a comment.

  • Asynchronous programming was initially most useful in network-systems / sockets. Whereas one very seldom has 100k files open, and wants to read from all of them asynchronously, chat-servers (or others that handle mostly idle connections) may very well have 100k+ of these connections. To be sure, async programming is by now a "style", one that avoids many of the problems with thread-based programming, but this is not where it started (although I have 0 proof of this statement).
  • In case of files, when one requests information, it should be arrive at some point soon-ish in the future. Perhaps this is comparable to doing an HTTP-request where one expects an answer and could possibly just wait for it in a synchronous manner. On the other hand, a socket can be open just for push-messages, where there is no expectation on when (if ever) a message will arrive. For some special files this may be true as well, but I would expect in that case a different async-message to be available, after which a sync-read should happen (like iNotify for normal files, never looked into special files).
  • I would argue that unless you know what you're doing, it's actually a very bad idea to do file-access massively parallel. Sockets could be useful that way, since you might be connecting to different machines. Doing massively parallel file IO, probably you want the OS to serialise the requests for you anyways (did you ever try to copy 2 large files from a CD rom at the same time ;))?

As I said, no definite answer, just some thoughts on the subject

Claude
  • 8,806
  • 4
  • 41
  • 56
  • Parallel is good for most everything these days. Even mechanical hard drives accept 16 requests using NCQ, and they have enough buffer RAM that even if all 16 are on different tracks it can hold all 16 entire tracks in buffer. – Zan Lynx Jun 25 '17 at 21:21
  • I realise some parallel calls make sense on files. Not massively so though. Async programming really pays off at the point where threads eat too much memory, at 100k concurrent connections or so. Don't want to do that on your hard disk or ssd. For memory-cached files, you could but then why use async access anyways, there is no io wait. – Claude Jun 25 '17 at 21:43
  • Ok, that covers the disk reads, but do we have an idea about the device files? – TheMeaningfulEngineer Jun 28 '17 at 23:52
  • I don't think there are many cases that you would want to read from 100k device files at the same time either. For that reason I think that lack of a huge need, resulted in no work done on that. But again, just a hunch. – Claude Jun 30 '17 at 17:02