How does fread really work?

Question

The declaration of fread is as following:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

The question is: Is there a difference in reading performance of two such calls to fread:

char a[1000];

fread(a, 1, 1000, stdin);
fread(a, 1000, 1, stdin);

Will it read 1000 bytes at once each time?

Keith Thompson · Accepted Answer · 2014-03-28T17:38:10.403

110

There may or may not be any difference in performance. There is a difference in semantics.

fread(a, 1, 1000, stdin);

attempts to read 1000 data elements, each of which is 1 byte long.

fread(a, 1000, 1, stdin);

attempts to read 1 data element which is 1000 bytes long.

They're different because fread() returns the number of data elements it was able to read, not the number of bytes. If it reaches end-of-file (or an error condition) before reading the full 1000 bytes, the first version has to indicate exactly how many bytes it read; the second just fails and returns 0.

In practice, it's probably just going to call a lower-level function that attempts to read 1000 bytes and indicates how many bytes it actually read. For larger reads, it might make multiple lower-level calls. The computation of the value to be returned by fread() is different, but the expense of the calculation is trivial.

There may be a difference if the implementation can tell, before attempting to read the data, that there isn't enough data to read. For example, if you're reading from a 900-byte file, the first version will read all 900 bytes and return 900, while the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900.

But in general, you should probably choose how to call it based on what information you need from it. Read a single data element if a partial read is no better than not reading anything at all. Read in smaller chunks if partial reads are useful.

edited Mar 28 '14 at 17:38

answered Dec 21 '11 at 12:16

Keith Thompson

254,901
44
429
631

_the second might not bother to read anything. In both cases, the file position indicator is advanced by the number of characters successfully read, i.e., 900_ shouldn't it be that in the second version the file position indicator _wouldn't_ advance since there was nothing read? In other words, shouldn't `fread(a, 1000, N, stdin);` always advance the fp indicator by a multiple of `1000`? – Shahbaz Jan 27 '14 at 11:26
3

Nevermind, found it. C11 at 7.21.8.1.2 and 7.21.8.2.2 says: _If an error occurs, the resulting value of the file position indicator for the stream is indeterminate._ – Shahbaz Jan 27 '14 at 11:27
so there is no way to recover the position of the indicator? Or to avoid reading that last chunck that mess with the position indicator? – David 天宇 Wong Feb 08 '14 at 16:01
1

@David天宇Wong: If you need to recover the position, call `ftell` before calling `fread`, and then `fseek` after. – Keith Thompson Feb 08 '14 at 16:02
I don't really understand @KeithThompson fseek will just put me where I want, but how do I know where I want to be? – David 天宇 Wong Feb 08 '14 at 16:32
@David天宇Wong: I don't know where you want to be. If you want to be at the position where you were before the `fread` call, you can call `ftell` before calling `fread` (it returns a value that indicates your current position), then pass that result to `fseek` after the `fread` call. – Keith Thompson Feb 08 '14 at 19:38
well right when fread cannot read big chunks, so I can switch to a smaller fread. Thanks :) – David 天宇 Wong Feb 08 '14 at 20:42
@David天宇Wong: Why would `fread` not be able to read big chunks? The only limits on the size of data that can be read by a single `fread` call should be the size of the file and the size of the memory buffer you're reading into. (It might make multiple calls to some underlying system call, but that's almost entirely transparent.) – Keith Thompson Feb 08 '14 at 20:43
as you said in your answer above, it fails if it reaches EOF. My problem is documented here : http://stackoverflow.com/questions/21647735/reading-and-writing-64bits-by-64-bits-in-c?noredirect=1#comment32718598_21647735 – David 天宇 Wong Feb 08 '14 at 20:49
2

The POSIX specification is much stricter ... it requires that fread does size fgetc's per object, so the exact same number of fgetc's will be done in either case (but the return values will be different). – Jim Balter Aug 15 '14 at 06:34

kennytm · Answer 2 · 2014-08-15T08:54:57.323

18

That would be implementation detail. In glibc, the two are identical in performance, as it's implemented basically as (Ref http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofread.c):

size_t fread (void* buf, size_t size, size_t count, FILE* f)
{
    size_t bytes_requested = size * count;
    size_t bytes_read = read(f->fd, buf, bytes_requested);
    return bytes_read / size;
}

Note that the C ~~and POSIX~~ standard does not guarantee a complete object of size size need to be read every time. If a complete object cannot be read (e.g. stdin only has 999 bytes but you've requested size == 1000), the file will be left in an interdeterminate state (C99 §7.19.8.1/2).

Edit: See the other answers about POSIX.

edited Aug 15 '14 at 08:54

answered Dec 21 '11 at 12:25

kennytm

510,854
105
1,084
1,005

You mention the POSIX standard but it requires fread to be implemented in terms of fgetc, which is much more deterministic than the C requirement. – Jim Balter Aug 15 '14 at 06:36
1

Awesome anwer..!! exactly what everyone landing here needs..!!! I m surprised it is having so many les votes.. – Sandeep Sep 03 '14 at 08:30
Is it the same for fwrite as well? – Sandeep Sep 03 '14 at 08:35
Important point: You can break the file when reading >1 sized records. – ArekBulski Oct 27 '15 at 03:38
@kennythm Does not `read` may be called several times before `fread` returns to meet the caller's requirement which may want to `fread` 1MB bytes? – John Jun 01 '22 at 09:34

ArjunShankar · Answer 3 · 2014-08-15T05:59:12.100

According to the specification, the two may be treated differently by the implementation.

If your file is less than 1000 bytes, fread(a, 1, 1000, stdin) (read 1000 elements of 1 byte each) will still copy all the bytes until EOF. On the other hand, the result of fread(a, 1000, 1, stdin) (read 1 1000-byte element) stored in a is unspecified, because there is not enough data to finish reading the 'first' (and only) 1000 byte element.

Of course, some implementations may still copy the 'partial' element into as many bytes as needed.

Neel Basu · Answer 4 · 2011-12-21T13:27:33.463

fread calls getc internally. in Minix number of times getc is called is simply size*nmemb so how many times getc will be called depends on the product of these two. So Both fread(a, 1, 1000, stdin) and fread(a, 1000, 1, stdin) will run getc 1000=(1000*1) Times. Here is the siimple implementation of fread from Minix

size_t fread(void *ptr, size_t size, size_t nmemb, register FILE *stream){
register char *cp = ptr;
register int c;
size_t ndone = 0;
register size_t s;

if (size)
    while ( ndone < nmemb ) {
    s = size;
    do {
        if ((c = getc(stream)) != EOF)
            *cp++ = c;
        else
            return ndone;
    } while (--s);
    ndone++;
}

return ndone;
}

genuine answer in my opinion – Sathvik May 01 '20 at 15:56 — Sathvik, May 01 '20 at 15:56

score 3 · Answer 5 · answered Dec 21 '11 at 12:19

There may be no performance difference, but those calls are not the same.

fread returns the number of elements read, so those calls will return different values.
If an element cannot be completely read, its value is indeterminate:

If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate. (ISO/IEC 9899:TC2 7.19.8.1)

There's not much difference in the glibc implementation, which just multiplies the element size by the number of elements to determine how many bytes to read and divides the amount read by the member size in the end. But the version specifying an element size of 1 will always tell you the correct number of bytes read. However, if you only care about completely read elements of a certain size, using the other form saves you from doing a division.

Clarus · Answer 6 · 2012-06-25T23:33:40.197

1

I wanted to clarify the answers here. fread performs buffered IO. The actual read block sizes fread uses are determined by the C implementation being used.

All modern C libraries will have the same performance with the two calls:

fread(a, 1, 1000, file);
fread(a, 1000, 1, file);

Even something like:

for (int i=0; i<1000; i++)
  a[i] = fgetc(file)

Should result in the same disk access patterns, although fgetc would be slower due to more calls into the standard c libraries and in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away.

Getting back to the difference between the two forms of fread. The former returns the actual number of bytes read. The latter returns 0 if the file size is less than 1000, otherwise it returns 1. In both cases the buffer would be filled with the same data, i.e. the contents of the file up to 1000 bytes.

In general, you probably want to keep the 2nd parameter (size) set to 1 such that you get the number of bytes read.

edited Jun 25 '12 at 23:33

answered Jun 07 '12 at 22:20

Clarus

2,259
16
27

"All modern C libraries will have the same performance with the two calls" -- yes. "in some cases the need for a disk to perform additional seeks which would have otherwise been optimized away" -- no. fgetc simply reads from stdio's in-memory buffer. And even if the stream has been set to be unbuffered, the underlying OS buffers disk reads. – Jim Balter Aug 15 '14 at 06:46
@Jim: fgetc reads from stdio in a different way than fread. The obvious result of this is that fgetc will always maximize the number of seeks/system calls (bad) where as fread will minimize the number of seeks/system calls as you are providing libc with more information about what you are doing. – Clarus Aug 16 '14 at 00:49
2

Sorry, but you have no idea what you're talking about ... there's no way in which fread or fgetc differ that affects the number of seeks, and you have provided no support for this absurd claim. Note that the definition of fread in the C99 and POSIX standards is given in terms of fgetc, as discussed elsewhere on this page. – Jim Balter Aug 16 '14 at 02:22

score 1 · Answer 7 · answered Dec 21 '11 at 12:20

1

One more sentence form http://pubs.opengroup.org/onlinepubs/000095399/functions/fread.html is notable

The fread() function shall read into the array pointed to by ptr up to nitems elements whose size is specified by size in bytes, from the stream pointed to by stream. For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.

Inshort in both case data will be accessed by fgetc()...!

answered Dec 21 '11 at 12:20

Jeegar Patel

26,264
51
149
222

yea i also feel so but on that page written "The functionality described on this reference page is aligned with the ISO C standard." seems doubty ? – Jeegar Patel Dec 21 '11 at 12:35
1

@Mr.32: the standard says the same thing about calls to `fgetc`, so Posix is indeed aligned with C99. But the standard doesn't give a conforming program any means to determine whether `fgetc` is "really" called, or whether `fread` does something else that's equivalent. 5.1.2.3 explains that the standard only describes the behavior of an "abstract machine", and lists in what ways the actual program must match that behavior. This is called the "as-if" rule in C++ but not C (my mistake earlier). Non-observable behavior need not be identical. – Steve Jessop Dec 21 '11 at 13:13
So, even if a particular implementation gives you some means to count how many times `fgetc` is called (perhaps by letting you link your program against your own version of that function, for example by modifying and recompiling libc), it can do that with the caveat that the function you're replacing is not called always and only when the standard describes the abstract machine as calling it. – Steve Jessop Dec 21 '11 at 13:15
@SteveJessop "Non-observable behavior need not be identical." So why it is documented in POSIX? – Roman Byshko Dec 21 '11 at 14:44
@Beginner: because a description of the behavior of the abstract machine is a convenient way to describe the effect of `fread` (or any other bit of C code). It's documented that way in Posix simply because it's documented that way in the standard. – Steve Jessop Dec 21 '11 at 15:13
@SteveJessop Any detectable difference between the library implementation and an implementation in terms of fgetc *is* observable, and is non-conformant. Of course one can debate what "detectable" consists of. – Jim Balter Aug 15 '14 at 06:41

How does fread really work?

7 Answers7

Linked

Related