12

I ask here since googling leads you on a merry trip around archives with no hint as to what the current state is. If you go by Google, it seems that async IO was all the rage in 2001 to 2003, and by 2006 some stuff like epoll and libaio was turning up; kevent appeared but seems to have disappeared, and as far as I can tell, there is still no good way to mix completion-based and ready-based signaling, async sendfile - is that even possible? - and everything else in a single-threaded event loop.

So please tell me I'm wrong and it's all rosy! - and, importantly, what APIs to use.

How does Linux compare to FreeBSD and other operating systems in this regard?

Will
  • 73,905
  • 40
  • 169
  • 246

4 Answers4

5

AIO as such is still somewhat limited and a real pain to get started with, but it kind of works for the most part, once you've dug through it.

It has some in my opinion serious bugs, but those are really features. For example, when submitting a certain amount of commands or data, your submitting thread will block. I don't remember the exact justification for this feature, but the reply I got back then was something like "yes of course, the kernel has a limit on its queue size, that is as intended". Which is acceptable if you submit a few thousand requests... obviously there has to be a limit somewhere. It might make sense from a DoS point of view, too (otherwise a malicious program could force the kernel to run out of memory by posting a billion requests). But still, it's something that you can realistically encounter with "normal" numbers (a hundred or so) and it will strike you unexpectedly, which is no good. Plus, if you only submit half a dozen or so requests and they're a bit larger (some megabytes of data) the same may happen, apparently because the kernel breaks them up in sub-requests. Which, again, kind of makes sense, but seeing how the docs don't tell you, one should expect that it makes no difference (apart from taking longer) whether you read 500 bytes or 50 megabytes of data.

Also, there seems to be no way of doing buffered AIO, at least on any of my Debian and Ubuntu systems (although I've seen other people complain about the exact opposite, i.e. unbuffered writes in fact going via the buffers). From what I can see on my systems, AIO is only really asynchronous with buffering turned off, which is a shame (it is why I am presently using an ugly construct around memory mapping and a worker thread instead).

An important issue with anything asynchronous is being able to epoll_wait() on it, which is important if you are doing anything else apart from disk IO (such as receiving network traffic). Of course there is io_getevents, but it is not so desirable/useful, as it only works for one singular thing.

In recent kernels, there is support for eventfd. At first sight, it appears useless, since it is not obvious how it may be helpful in any way. However, to your rescue, there is the undocumented function io_set_eventfd which lets you associate AIO with an eventfd, which is epoll_wait()-able. You have to dig through the headers to find out about it, but it's certainly there, and it works just fine.

Damon
  • 67,688
  • 20
  • 135
  • 185
  • i take it this is referring to kernel AIO as opposed to POSIX AIO? – Janus Troelsen Feb 08 '14 at 00:54
  • 1
    @JanusTroelsen: Yes, like the question, this refers to kernel AIO (`libaio`). POSIX AIO is not so much a Linux feature, but a library feature implemented in `librt` using a thread pool and standard synchronous I/O. Surprisingly, this works **much** better than the kernel implementation, in every respect. – Damon Feb 08 '14 at 12:50
  • @Damon: POSIX AIO is easier, but what is your evidence that it "works **much** better"? My understanding is that, assuming it's used properly (`O_DIRECT` on file systems that support it), kernel AIO is faster with lower resource usage (mostly because it doesn't need to create and manage a thread pool). Sure, POSIX AIO is more portable, but when you're using either technique you usually care about performance and resource use above all else, and kernel AIO is supposed to be better for that. – ShadowRanger Jun 13 '18 at 19:57
  • @ShadowRanger: The point of AIO is being _asynchronous_ without blocking. POSIX AIO provides exactly that, kernel AIO does not (and, Windows doesn't either, by the way!). Being "faster" is not a priority of AIO because it usually isn't any faster, but slower. There are notable exceptions, e.g. sequential first-time, one-time read from optical drives, in this case AIO is indeed much faster. In every "normal" case, it is orders of magnitude slower. That's OK though, since _asynchronous_ is what's desired and important, not _fast_. Now, kernel AIO blocks at inappropriate moments, .... – Damon Jun 14 '18 at 10:39
  • ... whereas POSIX AIO doesn't, which is much better. Plus, it uses the buffer cache and therefore is (in the normal, average case) much faster, too. So all in all, even though POSIX is an "embarrassment implementation" with threads doing plain normal reads/writes, it is nevertheless _much_ better. If support for the buffer cache made it into the kernel (proposal has been out for a decade or so, but to my knowledge still not accepted) then it might be on par speed-wise. – Damon Jun 14 '18 at 10:40
  • @Damon "If support for the buffer cache made it into the kernel [...]" I hesitate because you've had some bitter Linux async I/O experiences (and had your hopes dashed before) but maybe [`io_uring`](https://stackoverflow.com/a/57451551/2732969) (introduced in the 5.1 kernel) offers a ray of hope for the Linux AIO future? – Anon Aug 12 '19 at 06:14
  • @Anon: Looks great, only need a distro now where I don't have to build the kernel from source. Will be another 1-2 years I guess... but definitively looking forward. Thanks for the heads up. – Damon Aug 12 '19 at 09:17
  • The queue may block, but you can specify the desired max number of events to enqueue with io_setup(). As long as you know what that number is and you never try to enqueue more than that number of events, you'll be fine. See `man 3 io_setup`. – enigmaticPhysicist Mar 03 '21 at 00:15
3

Asynchronous disc IO is alive and kicking ... it is actually supported and works reasonably well now, but has significant limitations (but with enough functionality that some of the major users can usefully use it - for example MySQL's Innodb does in the latest version).

Asynchronous disc IO is the ability to invoke disc IO operations in a non-blocking manner (in a single thread) and wait for them to complete. This works fine, http://lse.sourceforge.net/io/aio.html has more info.

AIO does enough for a typical application (database server) to be able to use it. AIO is a good alternative to either creating lots of threads doing synchronous IO, or using scatter/gather in the preadv family of system calls which now exist.

It's possible to do a "shopping list" synchronous IO job using the newish preadv call where the kernel will go and get a bunch of pages from different offsets in a file. This is ok as long as you have only one file to read. (NB: Equivalent write function exists).

poll, epoll etc, are just fancy ways of doing select() that suffer from fewer limitations and scalability problems - they may not be able to be mixed with disc aio easily, but in a real-world application, you can probably get around this fairly trivially by using threads (some database servers tend to do these kinds of operations in separate threads anyway). Poll() is good, epoll is better, for large numbers of file descriptors. select() is ok too for small numbers of file descriptors (or specifically, low file descriptor numbers).

MarkR
  • 62,604
  • 14
  • 116
  • 151
  • so where does this stand? http://stackoverflow.com/questions/1825621/how-do-you-use-aio-and-epoll-together-in-a-single-event-loop – Will Oct 12 '10 at 05:41
  • (I thought poll and select were symmetric and O(n), whereas epoll is O(1) – Will Oct 12 '10 at 05:41
  • poll is better than select because it works a lot better with a sparse set of file descriptors - say you want to poll 10 FDs out of 10000 opened ones, you don't need an array of 10000 entries to be initialised with zeroes. epoll is better because you only need to register the new FDs you're interested in, not pass in all the ones you were already watching. – MarkR Oct 12 '10 at 10:26
3

(At the tail end of 2019 there's a glimmer of hope almost a decade after the original question was asked)

If you have a 5.1 or later Linux kernel you can use the io_uring interface which will hopefully usher in a better asynchronous I/O future for Linux (see one of the answers to the Stack Overflow question "Is there really no asynchronous block I/O on Linux?" for benefits io_uring provides over KAIO). Hopefully this will allow Linux to provide stiff competition to FreeBSD's asynchronous AIO without huge contortions!

Anon
  • 6,306
  • 2
  • 38
  • 56
2

Most of what I've learned about asynchronous I/O in Linux was by working on the Lighttpd source. It is a single-threaded web server that handles many simultaneous connections, using the what it believes is the best of whatever asynchronous I/O mechanisms are available on the running system. Take a look at the source, it supports Linux, BSD, and (I think) a few other operating systems.

Jonathan
  • 13,354
  • 4
  • 36
  • 32