-1

I am using a mmap to read a file. Now when filesize is around 880 MB it take around 0.5 second to iterate through file.

Now I increased the filesize by 10 times by replicating the content of file. Now using mmap again took around 1 min.

I thought that time to iterate should increase linearly with filesize.

Here is the simple test Code

FILE *fp = fopen64("filename", "rm");
if (fp == NULL)
{
    perror(NULL);
}
struct stat st;
fstat(fileno(fp), &st);
off_t fileSize = st.st_size;
char *data = (char *)mmap64(NULL, fileSize, PROT_READ,MAP_PRIVATE,fileno(fp), 0);
off_t ptr =0;
char c =(char)0;
while(ptr < fileSize) { c += data[ptr++] ;}
std::cout << c << "\n";

Here are the results.

For fileSize 880MB

real    0m0.501s
user    0m0.447s
sys     0m0.053s

For fileSize 8.8 GB

real    0m57.685s
user    0m10.773s
sys     0m3.690s
Anshuman
  • 421
  • 4
  • 11
  • 3
    How much RAM do you have? Odds are most of the smaller file fit in the system file cache (so paging in a page just means mapping in part of the cache), while the larger file does not (so paging in part of the file means actually reading it). You might try using `posix_madvise` (*NIX) or `PrefetchVirtualMemory` (Windows) to see if bulk prefetch helps when you do have enough RAM. – ShadowRanger Oct 26 '16 at 21:33
  • @ShadowRanger 8gb – Anshuman Oct 26 '16 at 21:34
  • Also when I just read top 880 MB in second file the time to read was same as first. – Anshuman Oct 26 '16 at 21:35
  • @ShadowRanger I tried `posix_madvise` but there was no improvement – Anshuman Oct 26 '16 at 21:45
  • What code did you use to "iterate through the file"? – R.. GitHub STOP HELPING ICE Oct 26 '16 at 21:47
  • BTW you should expect time for "linearly" processing large data to increase proportional to `n*sqrt(n)`, not `n`. See http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-part-i And 30 is pretty close to `10*sqrt(10)` – R.. GitHub STOP HELPING ICE Oct 26 '16 at 21:49
  • @R.. I am just using `char c = arr[ptr++]`, where arr is the mmap ptr. – Anshuman Oct 26 '16 at 21:51
  • @Anshuman: If you don't *do anything* with that, you should expect the compiler to optimize it out and for the entire operation to take zero time. – R.. GitHub STOP HELPING ICE Oct 26 '16 at 21:52
  • Anyway the complete test code would be needed to have a good chance at getting a correct answer. – R.. GitHub STOP HELPING ICE Oct 26 '16 at 21:55
  • @R.. After that I am making a string from it. And Then Indexing it using unordered_map. But I don't think The problem is in that part. It something related to mmap only. – Anshuman Oct 26 '16 at 21:57
  • @R.. I have updating the test Code – Anshuman Oct 26 '16 at 22:08
  • 3
    So you're not just doing linear processing of the file, but storing data read from it in a custom `Indexer` object using `unordered_map`, and you just omitted that from the question until now? Most likely something in your indexing brings in a `sqrt(n)` or even just `log(n)` factor to the big-O. – R.. GitHub STOP HELPING ICE Oct 26 '16 at 22:14
  • @R.. But as I am replicating the content of file so Indexer is effetively just returning the value. Also there shouldn't be a certain jump from 3 times to 4 times. As I have mentioned – Anshuman Oct 26 '16 at 22:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/126768/discussion-between-anshuman-and-r). – Anshuman Oct 26 '16 at 22:18
  • @R.. I don't think there was any effect of unordered_map – Anshuman Oct 26 '16 at 22:56

1 Answers1

3

Access time is not constant, it gets slower the larger your dataset is. I suggest reading Latency Numbers Every Programmer Should Know.

If you run a benchmark and adjust the dataset size, you will see several ranges where the performance changes.

  1. Performance is fastest when the dataset fits in L1 cache. L1 cache is small, say 64 KiB per core, but it is fast (~1 cycle access time, almost as fast as registers).

  2. Performance suddenly drops when you need L2 cache. L2 cache is larger and slower than L1 cache. The drop in performance is something like 10x or so.

  3. Performance drops again when your dataset is too large for L2 cache but fits in RAM. Another 10x performance drop or so for cache misses.

  4. Performance drops through the floor when your dataset is too large for RAM but fits on disk. The performance hit is like 1000x for cache misses, assuming you have a fast SSD, and maybe 100,000x if you have a non-SSD hard drive.

Your 880 MB dataset fits neatly in 8 GiB of RAM, but the 8,800 MB dataset does not, it cannot all be resident at the same time. Random access patterns are somewhat pessimal, but even with linear access patterns your pages will all get evicted from cache and the kernel will have to read them from disk over and over again.

It's nice to pretend that you have an infinite amount of storage that is all the same speed but that is not even remotely true.

Red herrings

  • Practically speaking, the only two ways you get the file into memory is with either read or mmap. Other options are just layers on top of those two. For sequential access to data that's not in a page cache, the difference between read and mmap is not relevant, see mmap() vs. reading blocks

  • Access patterns will change how much the performance drops when your dataset gets larger, but it won't change the fact that datasets too large to be resident can't be faster than disk.

Minor notes

  • If you're going to mmap then use open not fopen, the fopen is unnecessary.

  • The "m" flag for fopen does not do what you think it does, it serves no purpose here.

  • Don't use open64, fopen64, mmap64 or any of that nonsense. Just use #define _FILE_OFFSET_BITS 64. This is the modern way to do things, but of course, it's only relevant on 32-bit systems--and since you're using mmap at offset zero, there's no point.

  • Calling perror but continuing is a mistake. The err() function is not universally available but does what you want.

  • No good reason not to use MAP_SHARED here, but it won't change anything.

Here's how the code would look with more consistent error checking:

int fp = open("filename", O_RDONLY);
if (fp == -1)
    err(1, "open");
struct stat st;
int r = fstat(fp, &st);
if (r == -1)
    err(1, "stat");
// Compiler warning on 64-bit, but is correct
if (st.st_size > (size_t)-1)
    errx(1, "file too large");
size_t sz = st.st_size;
void *data = mmap(NULL, sz, PROT_READ, MAP_SHARED, fp, 0);
if (data == MAP_FAILED)
    err(1, "mmap");
unsigned counter = 0;
for (char *ptr = data, end = ptr + sz; ptr != end; ptr++)
    counter += *ptr;
printf("%u\n", counter);
Community
  • 1
  • 1
Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • I don't think there is any effect of unordered map. See the updated question. – Anshuman Oct 26 '16 at 22:57
  • Sorry, But question is still the same. I just removed the unwanted code which was of less significance. I added those code as R.. asked me to do so. – Anshuman Oct 26 '16 at 23:03
  • @Anshuman: If you need to expand a question to clarify things or remove unnecessary detail, do so. But don't just change the question. You get mostly the same answer either way. – Dietrich Epp Oct 26 '16 at 23:05