Memory mapping a file saves a data copy and is thus faster for large files.
Reading a file saves manipulation of MMU and is thus faster for small files.
When reading a large number of files, choosing the best method per file may make a difference.
Do I need to hard code the file size limit to make this decision or is there
a "best practise" (heuristic) algorithm to infer the decision file size from
some system variables at run time when running on Linux on Intel ?
The files in question will be read linearly (no random access) and are of very different sizes.
Edit: I'm not willing to implement some benchmark algorithm because the difference between mmap
and read
is small and does not justify such an overhead, even if there is a large number of files to be processed.
2nd edit: This is a general question about good coding habits and not tied to some particular set of files on some particular machine.
Imagine I would like to improve the performance of grep
(which is not actually true):
How would one implement linear read of many previously unknown files efficiently ?