No, there is no POSIX standard for direct IO.
There are at least two different APIs and behaviors that exist as of January 2023. Linux, FreeBSD, and apparently IBM's AIX use an O_DIRECT
flag to open()
, while Oracle's Solaris uses a directio()
function on an already-opened file descriptor.
The Linux use of the O_DIRECT
flag to the POSIX open()
function is documented on the Linux open()
man page:
O_DIRECT
(since Linux 2.4.10)
Try to minimize cache effects of the I/O to and from thishttps://man7.org/linux/man-pages/man2/open.2.html
file. In general this will degrade performance, but it is
useful in special situations, such as https://en.wikipedia.org/wiki/QFSwhen applications do
their own caching. File I/O is done directly to/from
user-space buffers. The O_DIRECT
flag on its own makes an
effort to transfer data synchronously, but does not give
the guarantees of the O_SYNC
flag that data and necessary
metadata are transferred. To guarantee synchronous I/O,
O_SYNC must be used in addition to O_DIRECT
. See NOTES
below for further discussion.
Linux does not clearly specify how direct IO interacts with other descriptors open on the same file, or what happens when the file is mapped using mmap()
; nor any alignment or size restrictions on direct IO read or write operations. In my experience, these are all file-system specific and have been improving/becoming less restrictive over time, but most Linux filesystems require page-aligned IO buffers, and many (most? all?) (did? still do?) require page-sized reads or writes.
FreeBSD follows the Linux model: passing an O_DIRECT
flag to open()
:
O_DIRECT
may be used to minimize or eliminate the cache effects
of reading and writing. The system will attempt to avoid caching the
data you
read or write. If it cannot avoid caching the data, it will minimize the
impact the data has on the cache. Use of this flag can drastically reduce performance if not used with care.
OpenBSD does not support direct IO. There's no mention of direct IO in either the OpenBSD open()
or the OpenBSD 'fcntl()` man pages.
IBM's AIX appears to support a Linux-type O_DIRECT
flag to open()
, but actual published IBM AIX man pages don't seem to be generally available.
SGI's Irix also supported the Linux-style O_DIRECT
flag to open()
:
O_DIRECT
If set, all reads and writes on the resulting file descriptor will
be performed directly to or from the user program buffer, provided
appropriate size and alignment restrictions are met. Refer to the
F_SETFL
and F_DIOINFO
commands in the fcntl(2)
manual entry for
information about how to determine the alignment constraints.
O_DIRECT
is a Silicon Graphics extension and is only supported on
local EFS and XFS file systems, and remote BDS file systems.
Of interest, the XFS file system on Linux originated with SGI's Irix.
Solaris uses a completely different interface. Solaris uses a specific directio()
function to set direct IO on a per-file basis:
Description
The directio()
function provides advice to the system about the
expected behavior of the application when accessing the data in the
file associated with the open file descriptor fildes
. The system
uses this information to help optimize accesses to the file's data.
The directio()
function has no effect on the semantics of the other
operations on the data, though it may affect the performance of other
operations.
The advice argument is kept per file; the last caller of directio()
sets the advice for all applications using the file associated with
fildes
.
Values for advice are defined in <sys/fcntl.h>
.
DIRECTIO_OFF
Applications get the default system behavior when accessing file data.
When an application reads data from a file, the data is first cached
in system memory and then copied into the application's buffer (see
read(2)
). If the system detects that the application is reading
sequentially from a file, the system will asynchronously "read ahead"
from the file into system memory so the data is immediately available
for the next read(2)
operation.
When an application writes data into a file, the data is first cached
in system memory and is written to the device at a later time (see
write(2)
). When possible, the system increases the performance of
write(2)
operations by cacheing the data in memory pages. The data
is copied into system memory and the write(2)
operation returns
immediately to the application. The data is later written
asynchronously to the device. When possible, the cached data is
"clustered" into large chunks and written to the device in a single
write operation.
The system behavior for DIRECTIO_OFF
can change without notice.
DIRECTIO_ON
The system behaves as though the application is not going to reuse the
file data in the near future. In other words, the file data is not
cached in the system's memory pages.
When possible, data is read or written directly between the
application's memory and the device when the data is accessed with
read(2)
and write(2)
operations. When such transfers are not
possible, the system switches back to the default behavior, but just
for that operation. In general, the transfer is possible when the
application's buffer is aligned on a two-byte (short) boundary, the
offset into the file is on a device sector boundary, and the size of
the operation is a multiple of device sectors.
This advisory is ignored while the file associated with fildes
is
mapped (see mmap(2)
).
The system behavior for DIRECTIO_ON
can change without notice.
Notice also the behavior on Solaris is different: if direct IO is enabled on a file by any process, all processes accessing that file will do so via direct IO (Solaris 10+ has no alignment or size restrictions on direct IO, so switching between direct IO and "normal" IO won't break anything.*). And if a file is mapped via mmap()
, direct IO on that file is disabled entirely.
* - That's not quite true - if you're using a SAMFS or QFS filesystem in shared mode and access data from the filesystem's active metadata controller (where the filesystem must be mounted by design with the Solaris forcedirectio
mount option so all access is done via direct IO on that one system in the cluster), if you disable direct IO for a file using directio( fd, DIRECTIO_OFF )
, you will corrupt the filesystem. Oracle's own top-end RAC database would do that if you did a database restore on the QFS metadata controller, and you'd wind up with a corrupt filesystem.