What is the purpose of file descriptors?

Question

My understanding is that both fopen() and open() can be used to open files. open() returns a file descriptor. But they should be equivalent in terms of get a file for writing or reading. What is the purpose of definining the file descriptors? It is not clear from the wiki page.

https://en.wikipedia.org/wiki/File_descriptor

score 2 · Answer 1 · answered Feb 07 '19 at 03:32

fopen returns a FILE * which is a wrapper around the file descriptor (I will ignore the "this is not required by the specification" aspect here, as I am not aware of an implementation that does not do this). At a high level, it looks like this:

application --FILE *--> libc --file descriptor--> kernel

Shells operate directly on file descriptors mainly because they are executing other programs, and you cannot modify the other program's FILE * objects. However, you are able to modify other program's file descriptors using the dup syscall at startup (i.e. between fork and exec). For example:

/bin/cat > foo.txt

This tells the shell to execute the /bin/cat program, but first redirect stdout (file descriptor #1) to a file that it opens. This is implemented as (pseudocode):

if (fork() == 0) {
    int fd = open("foo.txt");
    dup2(fd, 1);
    exec("/bin/cat");
}

The closest thing you can do with FILE * is calling freopen, but this is not persisted when using exec unlike file descriptors.

But why do we need FILE * at all then, if it's just a wrapper around a file descriptor? One main benefit is having a readahead buffer. For example, consider fgets. This will eventually call the read syscall on the file descriptor associated with the FILE * that you pass in. But how does it know how much to read? The kernel has no option to say "give me one line" (line-buffered ttys aside). If you read more than one line in the first read, the next time you call fgets you might only get part of the next line, since the kernel has already given you the first part in the previous read syscall. The other option would be calling read one character at a time, which is horrible for performance.

So what does libc do? It reads a bunch of characters at once, then stores the extra characters in an internal buffer on the FILE * object. The next time you call fgets, it is able to use the internal buffer. This buffer is also shared with functions like fread, so you can interleave calls to fgets and fread without losing data.

Mostly true, though `FILE`s are not necessarily buffered. But none of this really answers the question posed. POSIX *could* have defined most everything related to file descriptors in terms of (pointers to) `FILE` objects instead. Observing that it didn't does not explain *why* it didn't. — John Bollinger, Feb 07 '19 at 04:10

thb · Answer 2 · 2019-02-07T10:25:50.233

1

The two function at different levels:

open() is a lower-level, POSIX function to open a file. It returns a distinct integer to identify, and enable access to, the file opened. This integer is a file descriptor.
fopen() is a higher-level, portable, C standard-library function to open a file.

On a POSIX system, the portable fopen() probably calls the nonportable open(), but this is an implementation detail.

When in doubt, prefer fopen().

For more information, on a Linux system, man 2 read. The POSIX read() function reads data via the file descriptor returned by open().

edited Feb 07 '19 at 10:25

answered Feb 07 '19 at 02:34

thb

13,796
3
40
68

Shells primarily use file descriptors. How do they make it portable? – user1424739 Feb 07 '19 at 02:43
As far as I know, if the system supports a POSIX-based shell like Bash—and as far as I know just about everything but MS Windows does support it, and maybe Windows too these days—then the system would support POSIX. Ergo, `open()` would be portable to the extent to which POSIX is portable. However, it would not be portable in the sense of being part of the library every standard, hosted C installation is required to provide. Does this answer your question? – thb Feb 07 '19 at 02:47
1

One can do binary I/O via the `fread()` and `fwrite()` functions of the C standard library (using a `FILE *` obtained from `fopen()`). Plus, that way you get buffered I/O automatically if you want it. Can you be more specific about why you say file descriptor-based I/O is better than stream I/O for such tasks? – John Bollinger Feb 07 '19 at 03:36
@JohnBollinger You have taught me something I had not known. I had been using `open()` for that purpose for 20 years. In view of your advice, I have deleted the paragraph regarding binary and `fopen()`. – thb Feb 07 '19 at 10:28

What is the purpose of file descriptors?

2 Answers2