SSD - Single large disk read vs many small disk reads

Question

I'm working on a project where I'm using the filesystem as something of a database. I do a single batch job each day that will write tens of thousands of small files, and later read from those files.

These files can fit fully into the RAM of the machine, though are still tens of GBs.

So various questions:

Does a single large read tend to be faster or slower than many small reads of same size?
Could I improve performance by first writing all files to an "in-memory" FS within my language and writing to disk later in a single batch?
Same question for reads. Is it faster to load the whole folder into an in-memory FS vs dispatching many small reads interspersed with processing code?

score 1 · Answer 1 · answered Jan 30 '22 at 03:04

Does a single large read tend to be faster or slower than many small reads of same size?

It depends, but generally yes.

On one hand, the number of IO requests SSDs can do per second (see IOPS) is bounded. While this limit is quite big for SSDs compared to HDDs (especially new SSDs), this is often a limitation to read many small files efficiently. Note that current high-performance NVMe SSDs can reach about 300K IOPS. However, the file system needs to perform several IO request per file. Thus, keep in mind that the number of small file read is generally much smaller.

On another hand, reading huge buffers can be slower because the OS usually need to perform internal copies of the buffer and this introduces a significant overhead when the buffer do not fit in CPU caches on high-performance SSDs. Note that this is very dependent of the API used to read the target files (see the second part of this related answer). Most standard libraries for reading/writing files are actually buffered so reading very small chunks is not so slow (it still slower than medium-sized ones because of additional calls/operations). Buffering is only possible if for sequential read/writes though.

Could I improve performance by first writing all files to an "in-memory" FS within my language and writing to disk later in a single batch?

Probably not. This is highly dependent of the OS implementation and the access pattern of your application. If the writes are contiguous, I do not expect a huge speed up on a fast SSD since writing+reading the file in RAM adds an overhead and the cost of managing files is paid twice. Furthermore, note that some OS restrict the size of in-RAM FS and this space could be reserved (it needs to be tuned and often require advanced privileges). Besides, most OS tend to cache read/written files in RAM anyway (but in a significantly more efficient way than an in-RAM FS). This is the case on both Windows and Linux by default.

Same question for reads. Is it faster to load the whole folder into an in-memory FS vs dispatching many small reads interspersed with processing code?

Most OS can cache read/written files in memory also to read them faster later. The OS is responsible for freeing the cache when the memory is requested by processes. You can often hardly control its behaviour. If the files have been already written/read and you have enough free memory and the caching buffers are large enough, then there is no need for in-RAM FS (as it will likely be slower than the cache). If you read files non-sequentially and they tends to be evicted from the cache, then implementing a prefetching strategy using dedicated threads may help to speed up reads. Furthermore, mapped-files and asynchronous low-level APIs may also help to implement that more efficiently in such case.

Note that compacting many small file in a big one can significantly improve performance since it strongly reduce overheads of the FS subsystem (open system calls, close system calls, recursive directory fetches causing many additional IOPS in critical cases, etc.).

score 0 · Answer 2 · answered Jan 31 '22 at 21:14

Blockquote Does a single large read tend to be faster or slower than many small reads of same size?

It tends to be more efficient, since there are fewer round trips between memory and disk. There is a point of diminishing returns however, for which a tool like iometer or fio (depending on your OS - there are many tools like this) can be used to determine a good aggregation size, paying particular attention to direct mode (which bypasses the OS FS cache) and write-back or write-through mode. Consider whether low latency or high throughput are most important, as these are usually at odds with each other.

Can anyone recommend disk I/O benchmarking software for Windows?

For your specific device, there are usually storage sites with reviews that can show the max read/write speeds which can provide valuable performance targets. The bus interface you are using (such as SATA or PCIe) and play a major role as well.

SSD - Single large disk read vs many small disk reads

2 Answers2

Linked