48

Possible Duplicate:
mmap() vs. reading blocks

I heard (read it on the internet somewhere) that mmap() is faster than sequential IO. Is this correct? If yes then why it is faster?

  • mmap() is not reading sequentially.
  • mmap() has to fetch from the disk itself same as read() does
  • The mapped area is not sequential - so no DMA (?).

So mmap() should actually be slower than read() from a file? Which of my assumptions above are wrong?

Community
  • 1
  • 1
Lunar Mushrooms
  • 8,358
  • 18
  • 66
  • 88
  • 1
    @Mehrdad I saw some comments in internet that mmap is faster – Lunar Mushrooms Mar 22 '12 at 06:06
  • 3
    http://stackoverflow.com/questions/258091/when-should-i-use-mmap-for-file-access – Kumar Alok Mar 22 '12 at 06:10
  • 2
    `mmap` is probably faster that `fread` at least because less buffering is involved. But I am not sure that it is as fast as you believe. (Anyway, when you really have to do disk IO, the disk is the bottleneck). – Basile Starynkevitch Mar 22 '12 at 06:13
  • 5
    You should set up an experiment and time it to yourself, to make sure those comments on the internet are true. I have read many comments on the internet that are not true. – Crashworks Mar 22 '12 at 06:21
  • Ben Collins has already posted a [detailed answer](http://stackoverflow.com/questions/45972/mmap-vs-reading-blocks/151819#151819) about this matter. It's a duplicate for me. – Coren Mar 22 '12 at 06:55
  • 1
    @Coren - I disagree. He just shows emphericaly that it is faster, which is the *starting point* for this question. The question here is **why**, and Ben's answer does not address that question at all. – T.E.D. Dec 30 '13 at 22:39

3 Answers3

78

I heard (read it on the internet somewhere) that mmap() is faster than sequential IO. Is this correct? If yes then why it is faster?

It can be - there are pros and cons, listed below. When you really have reason to care, always benchmark both.

Quite apart from the actual IO efficiency, there are implications for the way the application code tracks when it needs to do the I/O, and does data processing/generation, that can sometimes impact performance quite dramatically.

  1. mmap() is not reading sequentially. 2) mmap() has to fetch from the disk itself same as read() does 3) The mapped area is not sequential - so no DMA (?).

So mmap() should actually be slower than read() from a file? Which of my assumptions above are wrong?

  1. is wrong... mmap() assigns a region of virtual address space corresponding to file content... whenever a page in that address space is accessed, physical RAM is found to back the virtual addresses and the corresponding disk content is faulted into that RAM. So, the order in which reads are done from the disk matches the order of access. It's a "lazy" I/O mechanism. If, for example, you needed to index into a huge hash table that was to be read from disk, then mmaping the file and starting to do access means the disk I/O is not done sequentially and may therefore result in longer elapsed time until the entire file is read into memory, but while that's happening lookups are succeeding and dependent work can be undertaken, and if parts of the file are never actually needed they're not read (allow for the granularity of disk and memory pages, and that even when using memory mapping many OSes allow you to specify some performance-enhancing / memory-efficiency tips about your planned access patterns so they can proactively read ahead or release memory more aggressively knowing you're unlikely to return to it).

  2. absolutely true

  3. "The mapped area is not sequential" is vague. Memory mapped regions are "contiguous" (sequential) in virtual address space. We've discussed disk I/O being sequential above. Or, are you thinking of something else? Anyway, while pages are being faulted in, they may indeed be transferred using DMA.

Further, there are other reasons why memory mapping may outperform usual I/O:

  • there's less copying:
    • often OS & library level routines pass data through one or more buffers before it reaches an application-specified buffer, the application then dynamically allocates storage, then copies from the I/O buffer to that storage so the data's usable after the file reading completes
    • memory mapping allows (but doesn't force) in-place usage (you can just record a pointer and possibly length)
      • continuing to access data in-place risks increased cache misses and/or swapping later: the file/memory-map could be more verbose than data structures into which it could be parsed, so access patterns on data therein could have more delays to fault in more memory pages
  • memory mapping can simplify the application's parsing job by letting the application treat the entire file content as accessible, rather than worrying about when to read another buffer full
  • the application defers more to the OS's wisdom re number of pages that are in physical RAM at any single point in time, effectively sharing a direct-access disk cache with the application
  • as well-wisher comments below, "using memory mapping you typically use less system calls"
  • if multiple processes are accessing the same file, they should be able to share the physical backing pages

The are also reasons why mmap may be slower - do read Linus Torvald's post here which says of mmap:

...page table games along with the fault (and even just TLB miss) overhead is easily more than the cost of copying a page in a nice streaming manner...

And from another of his posts:

  • quite noticeable setup and teardown costs. And I mean noticeable. It's things like following the page tables to unmap everything cleanly. It's the book-keeping for maintaining a list of all the mappings. It's The TLB flush needed after unmapping stuff.
  • page faulting is expensive. That's how the mapping gets populated, and it's quite slow.

Linux does have "hugepages" (so one TLB entry per 2MB, instead of per 4kb) and even Transparent Huge Pages, where the OS attempts to use them even if the application code wasn't written to explicitly utilise them.

FWIW, the last time this arose for me at work, memory mapped input was 80% faster than fread et al for reading binary database records into a proprietary database, on 64 bit Linux with ~170GB files.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • 9
    Nice answer. Also using memory maping you typically use less system calls. This could result in significant speedup of random access reads (i.e. `lseek` before every `read`). – well-wisher Mar 22 '12 at 10:32
  • @well-wisher: good point, I'll add that to the list above... cheers – Tony Delroy Mar 23 '12 at 00:54
  • 1
    Although this is a duplicated question, this answer seems to be more clearer than other answers in [this post](http://stackoverflow.com/questions/45972/mmap-vs-reading-blocks)... – sleepsort Aug 20 '14 at 06:13
  • 1
    Great answer, but it would be much better if it compared the performance of `mmap()` to, say, `pread()` on a file descriptor opened with `O_DIRECT`. `fread()` is buffered, and it will use an unknown number of system calls to actually read the data. "80% faster than `fread`" has more than a bit of, "Who cares?" about it. Without more data, there's just too much *unknown* going on under the hood of any `stdio`-based operation for it to be a definitive benchmark value. – Andrew Henle Jun 17 '17 at 13:56
  • 1
    The main thing this answer is lacking is that a modern OS cache file data; so "reading from a file" is more like "reading cached data from RAM with possible cache misses that involve disk IO, and possible prefetch/read-ahead". The organization of that cache (called "page cache" in Linux) almost always means that pages are the fundamental unit of (disk) IO regardless of what is on top (`mmap()` or `read()` or `fread()`). – Brendan Apr 04 '22 at 14:15
  • Also makes sense to mention https://en.wikipedia.org/wiki/Io_uring which might be even faster than those two. – 0andriy Jan 09 '23 at 16:11
13
  1. mmap() can share between process.
  2. DMA will be used whenever possible. DMA does not require contiguous memory -- many high end cards support scatter-gather DMA.
  3. The memory area may be shared with kernel block cache if possible. So there is lessor copying.
  4. Memory for mmap is allocated by kernel, it is always aligned.
ZachB
  • 13,051
  • 4
  • 61
  • 89
J-16 SDiZ
  • 26,473
  • 4
  • 65
  • 84
7

"Faster" in absolute terms doesn't exist. You'd have to specify constraints and circumstances.

mmap() is not reading sequentially.

what makes you think that? If you really access the mapped memory sequentially, the system will usually fetch the pages in that order.

mmap() has to fetch from the disk itself same as read() does

sure, but the OS determines the time and buffer size

The mapped area is not sequential - so no DMA (?).

see above

What mmap helps with is that there is no extra user space buffer involved, the "read" takes place there where the OS kernel sees fit and in chunks that can be optimized. This may be an advantage in speed, but first of all this is just an interface that is easier to use.

If you want to know about speed for a particular setup (hardware, OS, use pattern) you'd have to measure.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177