If you read bytes from an unbuffered stream using the read()
method, the JVM will end up making repeated read syscalls to the OS to read a single byte from the file. (Under the hood, the JVM is probably calling read(addr, offset, count)
with a count of 1.)
The cost of making a syscall is large. At least a couple of orders of magnitude more than a regular method call. This is because there are significant overheads in:
- Switching contexts between the application (unprivileged) security domain and the system (privileged) security domain. The register set needs to be saved, virtual memory mappings need to be changed, TLB entries need to be flushed, etc.
- The OS has to do various extra things to ensure that what syscall is requesting is legitimate. In this case, the OS has to figure out whether the requested offset and count are OK given the current file position and size, whether the address is within the application's address space, and map as writeable. And so on.
By contrast, if you use a buffered stream, the stream will try to read the file from the OS in large chunks. That typically results in a many-thousand-fold reduction in the number of syscalls.
In fact, this is NOT about how files are stored on disk. It is true that data ultimately has to be read a block at a time, etc. However, the OS is smart enough to do its own buffering. It can even read-ahead parts of the file so that they are in (kernel) memory ready for the application when it makes the syscall to read them.
It is extremely unlikely that multiple one byte read()
calls will result in extra disk traffic. The only scenario where this is plausible is if you wait a long time between each read()
... and the OS reuses the space where it was caching the disk block.