-1

I have a question about the SO question What is the difference between read() and fread()?

How do read() and fread() work? And what do unbuffered read and buffered read mean?

If fread() is implemented by calling read(), why do some people say fread() is faster than read() in the SO question C — fopen() vs open()?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
甚麼甚
  • 3
  • 6
  • "Buffered" means that there is a memory buffer between the actual "slow" IO device and your requests. So when you are writing to it, it is not writing right away to the device, but "caching" it in the buffer till some threshold is reached. Then it will flush the whole packet to the device in *one* physical access instead of many, which is usually faster. – Eugene Sh. Feb 02 '22 at 17:56
  • "how do read() and fread() work?" It's hard to answer a question like this, because we don't know what level of detail you are looking for. "what do unbuffered read and buffered read mean?" Well, do you understand what a *buffer* is? If you don't, please try an English dictionary - the programming meaning is really the same. Then, a buffered read is a read that uses a buffer; an unbuffered read is one that doesn't. – Karl Knechtel Feb 02 '22 at 17:56
  • @Eugene Sh. What about "fread()"?When I call "fread()",I actually call "read()" to get data and put it into internal buffer.After all data got,the data is put to user space buffer from internal buffer.Is it right? – 甚麼甚 Feb 02 '22 at 18:25

2 Answers2

1

What's the difference between read and fread?

fread is a standardized C programming language function that works with a FILE pointer.

read is a C function available on a POSIX compatible system that works with a file descriptor.

how do read() and fread() work?

fread calls some unknown system defined API to read from a file. To know how specific fread works you have to look at specific implmeentation of fread for your specific system - windows fread is very different from Linux fread. Here's implementation of fread as part of glibc library https://github.com/lattera/glibc/blob/master/libio/iofread.c#L30 .

read() calls read system call. https://man7.org/linux/man-pages/man2/syscalls.2.html

what do unbuffered read and buffered read mean?

fread reads data to some intermediate buffer, then copies from that buffer to the memory you passed as argument.

read makes kernel to copy straight to the buffer you passed as argument. There is no intermediate buffer.

why does some say fread() is faster than read() in this topic?

The assumption is that calling system calls is costly - what kernel does inside the system call and system call itself is costly.

Reading the data in a big like 4 kilobyte chunk into an intermediate buffer once and then reading from that buffer in small chunks within your program is faster than context switching to kernel every time to read a small chunks of data so that kernel will do repeatedly a small input/output operation to fetch the data.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • 2
    I'd say the bottleneck is not context switching but physical medium access, well in case of "real" files at least. – Eugene Sh. Feb 02 '22 at 18:00
  • 1
    @EugeneSh. — if the input is from a (disk) file, then `fread()` reads big chunks of data into a buffer and doles it out piecemeal without always making a system call. By contrast, using `read()` means that the system makes a system call each time, which incurs the cost of a 'context switch' between user-code and system-code. The underlying disk driver may well keep a disk block in memory, so that there isn't a physical access to the disk on each call to `read()`, but there is still the context switch. And, while system calls aren't all that expensive, they aren't all that cheap either. – Jonathan Leffler Feb 02 '22 at 18:22
  • 3
    It's not the case that Linux hits the disk for every `read()` call. The physical medium is already somewhat well buffered via the VFS and readahead. However, you can do (order of magnitude) 1M syscalls/sec, so a program using `read()` for single bytes might read 1MB/s at 100% CPU, while a program using `getc()` will saturate the physical medium at much lower CPU. – that other guy Feb 02 '22 at 18:25
  • _"fread calls some unknown system defined API"_: not _unknown_ but rather _upspecified_. – Jabberwocky Feb 02 '22 at 18:29
  • @JonathanLeffler `fread()` has [lock overhead](https://port70.net/~nsz/c/c11/n1570.html#7.21.2p7) that `read()` doesn't: "Each stream has an associated lock that is used to prevent data races when multiple threads of execution access a stream ...". Toss in read amplification if you use `fseek()`, and it's almost impossible for `fread()` to beat the performance of a decent use of `read()`. And if you really want to do multithreaded random access, there's `pread()` that doesn't even need seeking and therefore no locking. – Andrew Henle Feb 02 '22 at 18:34
  • That depends on your definition of “decent use”. If you are are reading 4 bytes at a time to get integers from a binary file, `fread()` will win hands down. The skew will be bigger if you read single bytes. If you read 4 KiB at a time, it isn't so clear cut. If you read some number of bytes that isn't a power of two, you may well find `fread()` wins. It depends on the context. – Jonathan Leffler Feb 02 '22 at 18:40
1

fread cannot get faster than read provided (!) you use the same buffer size for read as fread does internally.

However every disk access comes with quite some overhead, so you improve performance if you minimise their number.

If you read small chunks of data then every read accesses the disk directly, thus you get slow, while in contrast fread profits from its buffer as long as there's yet data in – and on being consumed up the next large chunk is read into the buffer at once to again provide small chunks from on being called again.

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • *If you read small chunks of data then every `read` accesses the disk directly* The page cache will come into play and is quite likely to result in may `read()` calls not having to actually perform a disk access. – Andrew Henle Feb 02 '22 at 18:37