Access time is not constant, it gets slower the larger your dataset is. I suggest reading Latency Numbers Every Programmer Should Know.
If you run a benchmark and adjust the dataset size, you will see several ranges where the performance changes.
Performance is fastest when the dataset fits in L1 cache. L1 cache is small, say 64 KiB per core, but it is fast (~1 cycle access time, almost as fast as registers).
Performance suddenly drops when you need L2 cache. L2 cache is larger and slower than L1 cache. The drop in performance is something like 10x or so.
Performance drops again when your dataset is too large for L2 cache but fits in RAM. Another 10x performance drop or so for cache misses.
Performance drops through the floor when your dataset is too large for RAM but fits on disk. The performance hit is like 1000x for cache misses, assuming you have a fast SSD, and maybe 100,000x if you have a non-SSD hard drive.
Your 880 MB dataset fits neatly in 8 GiB of RAM, but the 8,800 MB dataset does not, it cannot all be resident at the same time. Random access patterns are somewhat pessimal, but even with linear access patterns your pages will all get evicted from cache and the kernel will have to read them from disk over and over again.
It's nice to pretend that you have an infinite amount of storage that is all the same speed but that is not even remotely true.
Red herrings
Practically speaking, the only two ways you get the file into memory is with either read
or mmap
. Other options are just layers on top of those two. For sequential access to data that's not in a page cache, the difference between read
and mmap
is not relevant, see mmap() vs. reading blocks
Access patterns will change how much the performance drops when your dataset gets larger, but it won't change the fact that datasets too large to be resident can't be faster than disk.
Minor notes
If you're going to mmap
then use open
not fopen
, the fopen
is unnecessary.
The "m"
flag for fopen
does not do what you think it does, it serves no purpose here.
Don't use open64
, fopen64
, mmap64
or any of that nonsense. Just use #define _FILE_OFFSET_BITS 64
. This is the modern way to do things, but of course, it's only relevant on 32-bit systems--and since you're using mmap
at offset zero, there's no point.
Calling perror
but continuing is a mistake. The err()
function is not universally available but does what you want.
No good reason not to use MAP_SHARED
here, but it won't change anything.
Here's how the code would look with more consistent error checking:
int fp = open("filename", O_RDONLY);
if (fp == -1)
err(1, "open");
struct stat st;
int r = fstat(fp, &st);
if (r == -1)
err(1, "stat");
// Compiler warning on 64-bit, but is correct
if (st.st_size > (size_t)-1)
errx(1, "file too large");
size_t sz = st.st_size;
void *data = mmap(NULL, sz, PROT_READ, MAP_SHARED, fp, 0);
if (data == MAP_FAILED)
err(1, "mmap");
unsigned counter = 0;
for (char *ptr = data, end = ptr + sz; ptr != end; ptr++)
counter += *ptr;
printf("%u\n", counter);