4

I want to write log file, unstructured format (one line at a time), using mmap (for speed). What is the best procedure? Do I open empty file, truncate to 1 page size (write empty string to resize file?), then mmap - and repeat when mmaped area full?

I usually use mmap for writing fixed size structures, usually just one page at a time, however this is for writing log files (anywhere from 0.5 - 10 Gb) using mmap but not sure what's the best practice once the first mmaped area is filled - munmap, resize file truncate and mmap next page ?

While writing logs to memory area, I would tracking size, and msync , what is the proper handling once I get to the end of the mapped memory area?

Let's say I never need to go back or overwrite existing data, so I only write new data to file.

Q1: When I get to the end of mapped area do I munmap, ftruncate file to resize by another page size and mmap the next page ?

Q2: Is there a standard way to pre-empt and have the next page ready in memory for next write? Do this on another thread when we get close to the end of mapped area ?

Q3: Do I madvise for sequential access?

This is for real time data processing with requirement to keep log file - currently I just write to file. Log file is unstructured, text format, line based.

This is for linux/c++/c optionally testing on Mac (so no remap [?]).

Any links/pointers to best practices appreciated.

stefanB
  • 77,323
  • 27
  • 116
  • 141

2 Answers2

21

I wrote my bachelor thesis about the comparism of fwrite VS mmap ("An Experiment to Measure the Performance Trade-off between Traditional I/O and Memory-mapped Files"). First of all, for writing, you don't have to go for memory-mapped files, espacially for large files. fwrite is totally fine and will nearly always outperform approaches using mmap. mmap will give you the most performance boosts for parallel data reading; for sequential data writing your real limitation with fwrite is your hardware.


In my examples remapSize is the initial size of the file and the size by which the file gets increased on each remapping. fileSize keeps track of the size of the file, mappedSpace represents the size of the current mmap (it's length), alreadyWrittenBytes are the bytes that have already been written to the file.

Here is the example initalization:

void init() {
  fileDescriptor = open(outputPath, O_RDWR | O_CREAT | O_TRUNC, (mode_t) 0600); // Open file
  result = ftruncate(fileDescriptor, remapSize); // Init size
  fsync(fileDescriptor); // Flush
  memoryMappedFile = (char*) mmap64(0, remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create mmap
  fileSize = remapSize; // Store mapped size
  mappedSpace = remapSize; // Store mapped size
}

Ad Q1:

I used an "Unmap-Remap"-mechanism.

Unmap

  • first flushes (msync)
  • and then unmaps the memory-mapped file.

This could look the following:

void unmap() {
  msync(memoryMappedFile, mappedSpace, MS_SYNC); // Flush
  munmap(memoryMappedFile, mappedSpace)
}

For Remap, you have the choice to remap the whole file or only the newly appended part.

Remap basically

  • increases the file size
  • creates the new memory map

Example implementation for a full remap:

void fullRemap() {
  ftruncate(fileDescriptor, mappedSpace + remapSize); // Make file bigger
  fsync(fileDescriptor); // Flush file
  memoryMappedFile = (char*) mmap64(0, mappedSpace + remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create new mapping on the bigger file
  fileSize += reampSize;
  mappedSpace += remapSize; // Set mappedSpace to new size
}

Example implementation for the small remap:

void smallRemap() {
  ftruncate(fileDescriptor, fileSize + remapSize); // Make file bigger
  fsync(fileDescriptor); // Flush file
  remapAt = alreadyWrittenBytes % pageSize == 0 
            ? alreadyWrittenBytes 
            : alreadyWrittenBytes - (alreadyWrittenBytes % pageSize); // Adjust remap location to pagesize
  memoryMappedFile = (char*) mmap64(0, fileSize + remapSize - remapAt, PROT_WRITE, MAP_SHARED, fileDescriptor, remapAt); // Create memory-map
  fileSize += remapSize;
  mappedSpace = fileSize - remapAt;
}

There is a mremap function out there, yet it states

This call is Linux-specific, and should not be used in programs intended to be portable.

Ad Q2:

I'm not sure if I understood that point right. If you want to tell the kernel "and now load the next page", then no, this is not possible (at least to my knowledge). But see Ad Q3 on how to advise the kernel.

Ad Q3:

You can use madvise with the flag MADV_SEQUENTIAL, yet keep in mind that this does not enforce the kernel to read ahead, but only advices it.

Excerp form the man:

This may cause the kernel to aggressively read-ahead

Personal conclusion:

Do not use mmap for sequential data writing. It will just cause much more overhead and will lead to much more "unnatural" code than a simple writing alogrithm using fwrite.

Use mmap for random access reads to large files.

This are also the results that were obtained during my thesis. I was not able to achieve any speedup by using mmap for sequential writing, in fact, it was always slower for this purpose.

Markus Weninger
  • 11,931
  • 7
  • 64
  • 137
  • `mmap` indeed has no intermediate buffer and writes data directly as requested (same for `write` and friends), thing user space APIs generally avoid. This also allows fewer system calls, which are expensive to make. – edmz Mar 09 '16 at 13:13
  • 1
    Can you provide benchmark data for `mmap` vs `fwrite`, along with `write`? – Andrew Henle Mar 09 '16 at 13:22
  • 1
    I only compared `mmap` to `fwrite`, with further parameters like parallelization and side load, yet the thesis is currently not completely finnished and published, so I'm not sure if I'm allowed to publish the results at the moment. – Markus Weninger Mar 09 '16 at 13:26
  • Markus some time has gone by. Are you now allowed to share your results? Specifically I'm curious HOW fwrite is faster. WHY mmap is slower. My only guess is that I've never used msync(). I just mmap(), write data, munmap(), and the data appears on disk in its own good time. Is fwrite() faster only by also not doing an msync() or similar? – Swiss Frank Oct 21 '19 at 06:01
  • There may be other reasons than just IO speed for using mmap. For example, if your (possibly multithreaded) program crashes while in mid-writeout, log messages are lost. If OOM kills your process, log messages are lost. If you forgot to use append-only, the whole file is possibly garbled. Plus, you must sync. With memory mapping, you do not need to worry (the many syncs in your example are not needed). You have "some memory area" that you write to, and what happens after that is no longer your problem. Sure, pages may not be immediately written out, but that's fine (even desirable). – Damon Jan 12 '20 at 11:35
6

using mmap (for speed). What is the best procedure?

Don't use mmap, use write. Seriously. Why do people always seem to think that mmap would somehow magically speed things up?

Creating a mmap is not cheap, those page tables are not going to populate by themself. When you want to append to a file you have to

  • truncate to new size (with modern file systems that's quite cheap actually)
  • unmap the old mapping (leaving around dirty pages that may or may not have to be written out)
  • mmap the new mapping, which requires populating of page tables. Also every time to write to a previously unfaulted page, you're invoking the page fault handler.

There are a few good uses for mmap, for example when doing random access reads in a large data set or recurrent reads from the same dataset.

For further elaboration I'll refer to Linus Torvalds himself:

http://lkml.iu.edu/hypermail/linux/kernel/0004.0/0728.html

In article <200004042249.SAA06325@op.net>, Paul Barton-Davis wrote: >

I was very disheartened to find that on my system the mmap/mlock approach took 3 TIMES as long as the read solution. It seemed to me that mmap/mlock should be at least as fast as read. Comments are invited.

People love mmap() and other ways to play with the page tables to optimize away a copy operation, and sometimes it is worth it.

HOWEVER, playing games with the virtual memory mapping is very expensive in itself. It has a number of quite real disadvantages that people tend to ignore because memory copying is seen as something very slow, and sometimes optimizing that copy away is seen as an obvious improvment.

Downsides to mmap:

  • quite noticeable setup and teardown costs. And I mean noticeable. It's things like following the page tables to unmap everything cleanly. It's the book-keeping for maintaining a list of all the mappings. It's The TLB flush needed after unmapping stuff.

  • page faulting is expensive. That's how the mapping gets populated, and it's quite slow.

Upsides of mmap:

  • if the data gets re-used over and over again (within a single map operation), or if you can avoid a lot of other logic by just mapping something in, mmap() is just the greatest thing since sliced bread. This may be a file that you go over many times (the binary image of an executable is the obvious case here - the code jumps all around the place), or a setup where it's just so convenient to map the whole thing in without regard of the actual usage patterns that mmap() just wins. You may have random access patterns, and use mmap() as a way of keeping track of what data you actually needed.

  • if the data is large, mmap() is a great way to let the system know what it can do with the data-set. The kernel can forget pages as memory pressure forces the system to page stuff out, and then just automatically re-fetch them again.

And the automatic sharing is obviously a case of this..

But your test-suite (just copying the data once) is probably pessimal for mmap().

Linus

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • strangely recent results see mmap() _reads_ as either [matching](https://jvns.ca/blog/2014/05/12/computers-are-fast/) or [surpassing](https://web.archive.org/web/20170711083542/http://blog.burntsushi.net/ripgrep) read() performance _on large files_. So there are cases where it may help. – sourcejedi Aug 11 '17 at 07:45
  • @sourcejedi: Link? Which block size did they use? Also see https://eklitzke.org/efficient-file-copying-on-linux – datenwolf Aug 11 '17 at 12:07
  • [64K](https://github.com/jvns/howcomputer/blob/master/bytesum.c) in link 1. ripgrep (link 2) currently appears to use [8K](https://github.com/BurntSushi/ripgrep/blob/master/src/search_stream.rs#L20). So good point, thanks. Your link is confused, in that it explains the graph in terms of readahead, but the graph data is for the completely virtual (in-ram) device /dev/zero. – sourcejedi Aug 11 '17 at 12:33