1

The page size is 4096 bytes. Assume that you want a buffer twice as much, that is 8192 bytes.

If you use mmap you will map 8192 bytes without doing anything else (reading the actual data from the disk).

Then when you access the first byte, a page fault will occur and you will do one I/O to read the first page from the disk. After reading this page, you will get the first byte as an answer.

Then when you access the 4097-th byte, a new page fault will occur and you will do an extra I/O to read the second page from the disk to get this byte.

However, if you use read, you will only have to do one I/O to read 8192 bytes and then return the two bytes that you want.

This is a very small example, but I am kind of thinking what about if the buffer size is a few KB or MB? It looks like mmap with a page of size 4096 bytes will generate a lot of I/Os that can be avoided if you just the POSIX read call instead, which makes me wonder, why use mmap in the first place?

ksm001
  • 3,772
  • 10
  • 36
  • 57
  • some platforms might page in several pages in the background, not causing as many page faults (and if you intend to re-process data, i.e. "read" it more than once, the situation is naturally different). There's no real answer to this question without writing a benchmark for your problem, on your platform. – nos Jan 20 '15 at 12:43
  • 1
    There are several questions on this site along this subject matter, for example: http://stackoverflow.com/questions/45972/mmap-vs-reading-blocks/6383253#6383253 – Nim Jan 20 '15 at 12:46
  • 1
    Your assumptions are adventurous. In practice, the OS will start reading ahead before you touch the first page, and it will normally not read fewer than 32 pages at a time (of course there is no guarantee, but that's what happens in practice). Comparing performance for 8kB of data is somewhat silly, too. `mmap` has its merits, but optimizing a 8kB load is not one of them. Think about `mmap` when you're loading at least a megabyte. – Damon Jan 20 '15 at 13:54

0 Answers0