What is the rationale for fread/fwrite taking size and count as arguments?

Question

We had a discussion here at work regarding why fread() and fwrite() take a size per member and count and return the number of members read/written rather than just taking a buffer and size. The only use for it we could come up with is if you want to read/write an array of structures which aren't evenly divisible by the platform alignment and hence have been padded but that can't be so common as to warrant this choice in design.

From fread(3):

The function fread() reads nmemb elements of data, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.

The function fwrite() writes nmemb elements of data, each size bytes long, to the stream pointed to by stream, obtaining them from the location given by ptr.

fread() and fwrite() return the number of items successfully read or written (i.e., not the number of characters). If an error occurs, or the end-of-file is reached, the return value is a short item count (or zero).

Please check out this thread: http://stackoverflow.com/questions/8589425/how-does-fread-really-work — Franken, Apr 14 '12 at 20:42

score 85 · Answer 1 · edited Feb 01 '21 at 05:17

85

The difference in fread(buf, 1000, 1, stream) and fread(buf, 1, 1000, stream) is, that in the first case you get only one chunk of 1000 bytes or nothing, if the file is smaller and in the second case you get everything in the file less than and up to 1000 bytes.

edited Feb 01 '21 at 05:17

ghilesZ

1,502
1
18
30

answered Nov 17 '08 at 16:16

Peter Miehle

5,984
2
38
55

4

Although true, that only tells a small part of the story. It would be better to contrast something reading, say, an array of int values, or an array of structures. – Jonathan Leffler Nov 17 '08 at 22:22
4

This would make a great answer if the justification was completed. – Matt Joiner Sep 19 '10 at 04:32

Powerlord · Accepted Answer · 2008-11-17T16:28:15.053

It's based on how fread is implemented.

The Single UNIX Specification says

For each object, size calls shall be made to the fgetc() function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object.

fgetc also has this note:

Since fgetc() operates on bytes, reading a character consisting of multiple bytes (or "a multi-byte character") may require multiple calls to fgetc().

Of course, this predates fancy variable-byte character encodings like UTF-8.

The SUS notes that this is actually taken from the ISO C documents.

score 18 · Answer 3 · answered Sep 19 '10 at 02:32

18

This is pure speculations, however back in the days(Some are still around) many filesystems were not simple byte streams on a hard drive.

Many file systems were record based, thus to satisfy such filesystems in an efficient manner, you'll have to specify the number of items ("records"), allowing fwrite/fread to operate on the storage as records, not just byte streams.

answered Sep 19 '10 at 02:32

nos

223,662
58
417
506

1

I'm glad someone brought this up. I did a lot of work with filesystem specs and FTP and records/pages and other blocking concepts are very firmly supported, although nobody uses those parts of the specs anymore. – Matt Joiner Sep 19 '10 at 04:58

score 10 · Answer 4 · edited Nov 03 '17 at 21:00

Here, let me fix those functions:

size_t fread_buf( void* ptr, size_t size, FILE* stream)
{
    return fread( ptr, 1, size, stream);
}


size_t fwrite_buf( void const* ptr, size_t size, FILE* stream)
{
    return fwrite( ptr, 1, size, stream);
}

As for a rationale for the parameters to fread()/fwrite(), I've lost my copy of K&R long ago so I can only guess. I think that a likely answer is that Kernighan and Ritchie may have simply thought that performing binary I/O would be most naturally done on arrays of objects. Also, they may have thought that block I/O would be faster/easier to implement or whatever on some architectures.

Even though the C standard specifies that fread() and fwrite() be implemented in terms of fgetc() and fputc(), remember that the standard came into existence long after C was defined by K&R and that things specified in the standard might not have been in the original designers ideas. It's even possible that things said in K&R's "The C Programming Language" might not be the same as when the language was first being designed.

Finally, here's what P.J. Plauger has to say about fread() in "The Standard C Library":

If the size (second) argument is greater than one, you cannot determine whether the function also read up to size - 1 additional characters beyond what it reports. As a rule, you are better off calling the function as fread(buf, 1, size * n, stream); instead of fread(buf, size, n, stream);

Bascially, he's saying that fread()'s interface is broken. For fwrite() he notes that, "Write errors are generally rare, so this is not a major shortcoming" - a statement I wouldn't agree with.

Actually I often like doing it the other way: `fread(buf, size*n, 1, stream);` If incomplete reads are an error condition, it's simpler to arrange for `fread` to simply return 0 or 1 rather than the number of bytes read. Then you can do things like `if (!fread(...))` instead of having to compare the result against the requested number of bytes (which requires extra C code and extra machine code). — R.. GitHub STOP HELPING ICE, Jul 31 '10 at 00:16
@R.. Just be sure to check that size * count != 0 in addition to !fread(...). If size * count == 0, you're getting a zero return value on a *successful* read (of zero bytes), feof() and ferror() won't be set, and errno will be something nonsensical like ENOENT, or worse, something misleading (and possibly critically breaking) like EAGAIN -- very confusing, especially since basically no documentation screams this gotcha at you. — Pegasus Epsilon, Mar 27 '19 at 05:52

dolch · Answer 5 · 2008-11-17T19:00:40.860

3

Likely it goes back to the way that file I/O was implemented. (back in the day) It might have been faster to write / read to files in blocks then to write everything at once.

edited Nov 17 '08 at 19:00

answered Nov 17 '08 at 16:21

dolch

294
2
4

Not really. The C specification for fwrite notes that it makes repeated calls to fputc: http://www.opengroup.org/onlinepubs/009695399/functions/fwrite.html – Powerlord Nov 17 '08 at 16:25

score 1 · Answer 6 · answered Aug 11 '18 at 01:34

1

Having separate arguments for size and count could be advantageous on an implementation that can avoid reading any partial records. If one were to use single-byte reads from something like a pipe, even if one was using fixed-format data, one would have to allow for the possibility of a record getting split over two reads. If could instead requests e.g. a non-blocking read of up to 40 records of 10 bytes each when there are 293 bytes available, and have the system return 290 bytes (29 whole records) while leaving 3 bytes ready for the next read, that would be much more convenient.

I don't know to what extent implementations of fread can handle such semantics, but they could certainly be handy on implementations that could promise to support them.

answered Aug 11 '18 at 01:34

supercat

77,689
9
166
211

@PegasusEpsilon: If e.g. a program does `fread(buffer, 10000, 2, stdin)` and the user types newline-ctrl-D after typing 18,000 bytes, it would be nice if the function could return the first 10,000 bytes while leaving the remaining 8,000 pending for future smaller read requests, but are there any implementations where that would happen? Where would the 8,000 bytes be stored pending those future requests? – supercat Mar 27 '19 at 15:12
Having just tested it, turns out fread() does not operate in what I would consider the most convenient way in this regard, but then stuffing bytes back into the read buffer after determining a short read is probably a bit more than we should expect from standard library functions anyway. fread() will read partial records and shove them into the buffer, but the return value will specify how many *complete* records have been read, and tells you nothing (which is fairly annoying to me) about any short reads pulled off stdin. – Pegasus Epsilon Mar 29 '19 at 01:34
...continued... Best you can do is probably fill your read buffer with nulls before fread, and check the record after where fread() says it finished for any non-null bytes. Doesn't particularly help you when your records may contain null, but if you're going to use `size` greater than 1, well... For the record, there may also be ioctls or other nonsense you can apply to the stream to make it behave differently, I haven't delved that deeply. – Pegasus Epsilon Mar 29 '19 at 01:38
Also I've deleted my earlier comment due to inaccuracy. Oh well. – Pegasus Epsilon Mar 29 '19 at 01:40
@PegasusEpsilon: C is used on so many platforms, which accommodate different behaviors. The notion that programmers should expect to use the same features and guarantees on all implementations ignores what had been the best feature of C: that its design would allow programmers to use features and guarantees on platforms where they were available. Some kinds of streams can support arbitrary-sized pushbacks easily, and having `fread` work as you described on such streams would be useful if there were some way to identify streams that work in that fashion. – supercat Mar 30 '19 at 14:58
i/o streams are already so far above the abstraction of a file descriptor that trying to define their behavior based on what the architecture's underlying file descriptors support seems silly. Not to mention the fact that read() and fread() could both read from a library-maintained read buffer, allowing all the buffer tomfoolery you could ever want. But then there are a few things about C I disagree with, design-wise. Too late to change them now. – Pegasus Epsilon Mar 31 '19 at 18:52
@PegasusEpsilon: If some programs would need to perform tasks of which many systems are capable but many others aren't, having a standard means of performing those tasks *on those systems that can support them* would seem more useful than requiring that every program which isn't supportable on all platforms be individually customized for every individual implementation that supports the required semantics. – supercat Apr 03 '19 at 14:25
You understand that no system has native support for a conceptual thing like a FILE stream? Never mind the fact that file systems in general aren't hardware-native, the idea of a *stream* is a whole-cloth standard library fabrication. The fact that it behaves in a particular way because not all hardware supports the thing we'd prefer is completely arbitrary -- the entire concept is invented well above the hardware level, they just chose not to write *the spec* in a particular way. – Pegasus Epsilon Apr 04 '19 at 18:42
An example of a place where the C spec requires inconvenient home-rolled code, where other systems may natively support what C wants is the ASCIZ null-terminated string. DOS uses this all over the place, but hilariously not for console output. For console output (and only console output, as far as I can tell) interrupt 21h requires a string be terminated with $. C doesn't suddenly convert to using $-terminated strings on DOS, however, it requires C library implementers to write a puts() function that still uses the null-terminator. – Pegasus Epsilon Apr 04 '19 at 18:45
Point being, the spec is what the spec is, and in many cases it must choose: support this everywhere, regardless of difficulty on a given platform, or do not support it because it would be complicated in some places. Partial reads is clearly the latter, while puts() is clearly the former, and the choice there is left up to some guy, who may or may not use logic (puts() is easier than my hypothetical fread() to implement, to be fair, but it's also easier than fread() as it stands, so...) to make that decision. – Pegasus Epsilon Apr 04 '19 at 18:48
@PegasusEpsilon: Most writers of specifications recognize a third option: recognize things that quality implementations should do *when practical*. Many execution environments do have native concepts of streams that operate much like those used by fread, fopen, etc. and C implementations for such environments often try to minimize the amount of buffering between the C streams and those of the environment. – supercat Apr 04 '19 at 20:19

score -2 · Answer 7 · answered Apr 18 '12 at 11:20

I think it is because C lacks function overloading. If there was some, size would be redundant. But in C you can't determine a size of an array element, you have to specify one.

Consider this:

int intArray[10];
fwrite(intArray, sizeof(int), 10, fd);

If fwrite accepted number of bytes, you could write the following:

int intArray[10];
fwrite(intArray, sizeof(int)*10, fd);

But it is just inefficient. You will have sizeof(int) times more system calls.

Another point that should be taked into consideration is that you usually don't want a part of an array element be written to a file. You want the whole integer or nothing. fwrite returns a number of elements succesfully written. So if you discover that only 2 low bytes of an element is written what would you do?

On some systems (due to alignment) you can't access one byte of an integer without creating a copy and shifting.

What is the rationale for fread/fwrite taking size and count as arguments?

7 Answers7

Linked

Related