6

I am implementing cp(file copy) command using mmap(). For that I mapped the source file in MAP_PRIVATE (As I just want to read)mode and destination file in MAP_SHARED mode(As I have to writeback the changed content of destination file).

While doing this I have observed performance penalty due to lots of minor page faults that occurs due to 2 reason. 1) Zero fill on demand while calling mmap(MAP_PRIVATE) for source file. 2) Copy on write while calling mmap(MAP_SHARED) for destination file.

Is there any way to disable Zero-fill-on-demand and Copy-on-write ?

Thanks, Harish

osgx
  • 90,338
  • 53
  • 357
  • 513
Harish
  • 63
  • 1
  • 4
  • I am surprised that you see a performance penalty for zero fill, how are you measuring it? You don't want to disable COW, it is fundamental to the way virtual memory works, and improves performance. Have you considered that using `write(2)` might be more efficient for the copy? Specify the private map as the buffer to write. It also avoids the step of expanding the new file, since `write(2)` will do it for you. – cdarke Jun 21 '12 at 07:34
  • I am measuring the minor page fault by getrusage().It show there are nearly 50000 minor page fault to copy 1gb of file with mmam()(nearly 25000 for read mmap(MAP_PRIVATE) and same for write mmap(MAP_SHARED)). Yes, I have checked write(2) is more efficient than mmap() for copying but I think mmap() can be efficient if we disable Zero-fill-on-demand and Copy-on-write. – Harish Jun 21 '12 at 08:55
  • Harish, check `madvice()` and `mlock()` syscalls. They may affect number of page faults. And for fast file copy, check syscall `sendfile()`. – osgx Jun 21 '12 at 10:49
  • @osgx,I have a control over the major page fault but the problem is with the minor page fault.. – Harish Jun 21 '12 at 12:09
  • 1) does not happen except for the last partial page (if there is one), and I don't understand 2). Why using copy on write on the _destination_? Also, trying to improve `cp` performace under Linux is probably the best case in point for `splice`. Saves the roundtrip to user space alltogether. – Damon Jun 22 '12 at 15:57

1 Answers1

5

There is MMAP_POPULATE flag of mmap(2):

http://linux.die.net/man/2/mmap

MAP_POPULATE (since Linux 2.5.46) Populate (prefault) page tables for a mapping. For a file mapping, this causes read-ahead on the file. Later accesses to the mapping will not be blocked by page faults. MAP_POPULATE is only supported for private mappings since Linux 2.6.23.

It should pre-fault all pages in mmapped region. It should work for question (1), and may not work for question (2) (shared).

osgx
  • 90,338
  • 53
  • 357
  • 513
  • 4
    Note: `MAP_POPULATE` means no delays when you're using the mapping (unless it gets paged out by memory pressure), but it also means the `mmap` call itself blocks until the whole file is read in. It's often better to avoid `MAP_POPULATE` in favor of [`posix_madvise`](http://linux.die.net/man/3/posix_madvise) (or non-standardized [`madvise`](http://linux.die.net/man/2/madvise)) using `POSIX_MADV_WILLNEED`, which is equivalent to `MAP_POPULATE`, but doesn't block. You can open/map the source file, advise it to load, and the OS will background read in bulk, rather than demand faulting. – ShadowRanger Sep 16 '16 at 18:11
  • 1
    You might block on reading from the `mmap`, but because the whole read in is scheduled up front, the read will already be in progress when you hit the unpopulated page; you won't be dispatching new I/O requests live. – ShadowRanger Sep 16 '16 at 18:12