How does fread in C actually work?

Question

I understand that fread() has the following function definition:

size_t fread(void *buffer, size_t size, size_t qty, FILE *inptr);

I also understand that inptr is a file pointer that is returned when a FILE pointer is opened using the fopen() function. My question is does inptr store the memory address of every single character/letter of the file in its memory? If that is the case, do the memory addresses from the inptr get copied to *buffer (pointer to buffer array)?

There is one more thing that I am confused about. For each time fread() is called, size * qty bytes of memory is being copied/transferred. Is it the content of the file pointed to by inptr itself or is the memory address of the content of the file that is being copied/transferred?

Would appreciate if someone can help me clear the confusion. Thank you :)

_My question is does inptr store the memory address of every single char/letter of the file in its memory?_ check what `FILE` contains, read the members of this structure. open `vi /usr/include/libio.h` & find the definition of `FILE`. — Achal, May 13 '18 at 02:00
The file pointer represents a handle that is used by reading functions like `fread()` to retrieve data from an actual file (assuming it has been successfully opened, of course). The implementation of reading functions may use a buffer - i.e. they copy some of the data into memory, in order to optimise access to the actual file. But that sort of thing is an implementation detail that you should not need to worry about. And dereferencing a file pointer to access memory is not the way to retrieve data from the file. — Peter, May 13 '18 at 02:01
Take a look at this answer: https://stackoverflow.com/a/5130577/946835 — CoyBit, May 13 '18 at 02:23

score 1 · Accepted Answer · answered May 13 '18 at 02:14

FILE is implemented by your operating system. The functions operating on FILE are implemented by your system. You don't know. To know, you need to browse sources of your operating system.
inptr may be a pointer to memory allocated by your operating system. Or it may be a number, that your operating system uses to find it's data. Anyway, it's a handle, that your system uses to find FILE specific data. And your system decides what is in that data. For caching purposes, maybe all letters are cached in some buffer. Maybe not.
fread call. Fread reads data from an underlying entity behind inptr handle. inptr is interpreted by your system, to access the underlying memory or structure or device or hard drive or printer or keyboard or mouse or anything. It reads qty*size bytes of data. Those data are placed in the buffer. No pointers are placed there. The bytes that are read from the device are placed in the memory pointed to by buffer.

score 0 · Answer 2 · answered May 13 '18 at 02:17

Your questions are a bit confusing (which is probably why you're asking them) so I'll do my best to answer.

FILE *inptr is a handle to the open file. You do not directly read it, it is just used to tell related functions what to operate on. You can kinda think of it like a human reading a file name in a folder, where the file name is used to identify the file, but the contents are accessed in another way.

As for the data, it is read from the file which is opened with fopen() and subsequently provided a file handle. The data does not directly correlate to the FILE pointer, and typically you should not be messing with the FILE pointer directly (don't try to read/write from it directly).

I tried to not get too technical as to the operation, as it seems you are new to C, but just kind of think of the FILE * as the computer's way of "naming" the file internally for its own usage, and the data buffer is merely the content.

May I ask why shouldn't I mess with the FILE pointer directly? — Prav, May 13 '18 at 06:46
@PravElan Three reasons: you shouldn't, you can't, and it might not work. (1) The data pointed to by a `FILE *` pointer is all private to your system. You are not intended to inspect or manipulate it directly. (2) Your code may not have access to the definition of the `FILE` structure that would even let you try to access the data. (3) `FILE *` pointers are special; there are some special rules on things you're not allowed to do with them. (The rules let certain implementations work the way they do. But they're pretty obscure; you're unlikely to break them by accident.) — Steve Summit, May 13 '18 at 10:23

score 0 · Answer 3 · answered May 13 '18 at 03:58

You can think of fread as being implemented something like this:

size_t fread(char *ptr, size_t size, size_t nitems, FILE *fp)
{
    size_t i;
    for(i = 0; i < size * nitems; i++) {
        int c = getc(fp);
        if(c == EOF) break;
        *ptr++ = c;
}

(I've left out the return value because in my simplified illustration there isn't a good way to show it.)

In other words, fread reads a bunch of characters as if by repeatedly calling getc(). So obviously this begs the question of how getc works.

What you have to know is that FILE * points to a structure which, one way or another, contains a buffer of some (not necessarily all) of the file's characters read into memory. So, in pseudocode, getc() looks like this:

int getc(FILE *fp)
{
    if(fp->buffer is empty) {
        fill fp->buffer by reading more characters from underlying file;
        if(that resulted in end-of-file)
            return EOF;
    }

    return(next character from fp->buffer);
}

score 0 · Answer 4 · answered May 13 '18 at 06:45

The answer to the question,

"how does fread() work?"

is basically

"it asks your operating system to read the file for you."

More or less the sole purpose of an operating system kernel is to perform actions like this on your behalf. The kernel hosts the device drivers for the disks and file systems, and is able to fetch data for your program no matter what the file is stored on (e.g. a FAT32 formatted HDD, a network share, etc).

The way in which fread() asks your operating system to fetch data from a file varies slightly between OS and CPU. Back in the good old days of MS-DOS, the fread() function would load up various parameters (calculated from the parameters your program gave to fread()) into CPU registers, and then raise an interrupt. The interrupt handler, which was actually part of MS-DOS, would then go and fetch the requested data, and place it in a given place in memory. The registers to be loaded and the interrupt to raise were all specified by the MS-DOS manuals. The parameters you pass to fread() are abstractions of those needed by the system call.

This is what's known as making a system call. Every operating system has a system calling interface. Libraries like glibc on Linux provide handy functions like fread() (which is part of the standard C library), and make the system call for you (which is not standardised between operating systems).

Note that this means that glibc is not a fundamental part of the operating system. It's just a library of routines that implements the C standard library around the system calls that Linux provides. This means you can use an alternative C library. For example, Android does not use glibc, even though it has a Linux kernel.

Similarly on Windows. All software in Windows (C, C++, the .NET runtime, etc) is written to use the WIN32 API library (win32.dll). The difference on Windows is that the NT kernel system calling interface is not published; we don't know what it is.

This leads to some interesting things.

WINE on Linux recreates WIN32.dll, not the NT kernel system call interface.
Windows Subsystem for Linux on Windows 10 does recreate the Linux system calling interface (which is possible because it is public knowledge).
Solaris, QNX and FreeBSD pull the same trick.
Even more oddly it's looking like MS have done a NT kernel system interface shim for Linux (i.e, the thing that WINE hasn't done) to allow MS-SQLServer to run on Linux. This in effect is a Linux Subsystem for Windows. They've not given this away.

How does fread in C actually work?

4 Answers4