1

I was trying to see if I could have the OS, linux, persist memory changes to disk for me. I would map certain sections of a file into memory. The file let's say would be a circular queue. I was figuring that it would be more efficient if I let the OS handle writing the changed pages to disk.

I started looking into mmap(), msync() and munmap(). I found the following article:

c linux msync(MS_ASYNC) flush order

in which one of the posts indicate that MS_ASYNC of msync() is a no-op since the OS already tracks dirty pages and flushes them to storage when necessary. It would be nice to know exactly what that means. I also found this:

msync() behaviour broken for MS_ASYNC, revert patch?

I didn't understand much of that conversation. I guess I was looking for an efficient way for changes I make to an in memory representation to be persisted to disk, even in the event of a crash.

I wrote the small sample app below. It seems even when I introduce a crash the latest data I've written to the mapped memory is stored to disk.

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>

void main(int argc, char* argv[])
{
    int result;
    int fd = -1;

    if (argc != 2)
        {
        printf("Error, missing file name argument\n");
        goto done;
        }

    fd = open(argv[1], O_RDWR | O_CREAT, S_IWUSR | S_IRUSR);
    if (fd == -1)
        {
        printf("Failed opening file %s: %d\n", argv[1], errno);
        goto done;
        }

    unsigned int size = 8 * 1024L * 1024L;
    result = ftruncate(fd, size);
    if (result != 0)
        {
        printf("Failed setting file size: %d\n", errno);
        goto done;
        }

    void* addr;
    addr = mmap(NULL, size, PROT_WRITE, MAP_FILE | MAP_SHARED,
            fd, 0);
    if (addr == MAP_FAILED)
        {
        printf("Failed to map memory: %d\n", errno);
        goto done;
        }
    memset(addr, 'r', size);
    result = msync(addr, size, MS_ASYNC);
    getchar();
    if (result != 0)
        {
        printf("Failed syncing mapped memory: %d\n", errno);
        goto done;
        }
    memset(addr, 'p', size);
    getchar();

    memset(addr, 'm', size);

    // crash.

    *((int*) 0) = 0;

done:
    printf("done\n");
    if (fd != -1)
        close(fd);
    printf("closed file\n");
    return;
}

So is using mmap(), msync(MS_ASYNC) a reasonable way to have the OS persist my in-memory changes to disk?

Thanks, Nick

Community
  • 1
  • 1

1 Answers1

1

So is using mmap(), msync(MS_ASYNC) a reasonable way to have the OS persist my in-memory changes to disk?

No, it's not.

Your test case doesn't prove that in all cases your data will be persisted to stable storage - only that it happened to be visible in your narrow scenario. Further, when people talk about data written to disk being persisted in the face of a "crash" they usually mean a crash of the operating system or hardware power loss (e.g. kernel panic, abrupt power off etc) - a userland program just segfaulting doesn't stop the running kernel being able to access (and even sync) the dirty data rolling around in memory. Unfortunately this means your test was demonstrating something different to what you needed it to.

As mentioned here, to know if the data truly made it to stable storage you have to use (and check the result of) one of the following:

  • msync(MS_SYNC)
  • fsync
  • sync_file_range + fsync (for metadata)

You were never in a position to use msync(MS_ASYNC) because it doesn't tell you when the data had been successfully persisted (and on Linux it doesn't even force writeback to start taking place which is what the posts you link to are warning about). Either:

  • You care about persistence and you need to know when that data has finished being written to persistent stable storage (e.g. you want to postpone some other action until it is).
  • Or you don't care and you're fine with that data only being readable while the system continues operating correctly thus you're OK with that data being indeterminate in other scenarios.
Anon
  • 6,306
  • 2
  • 38
  • 56