Does the OS (POSIX) finish a modification to a memory-mapped file if the process is SIGKILLed?

Question

A similar post talks about if changes to a memory mapped file are flushed to disk after a SIGKILL, but what happens if the process is SIGKILLed in the middle of performing a change, e.g. write/delete, to the memory buffer before it is flushed to disk?

Does the underlying file get updated and corrupted? Is the write/delete operation finished before killing the process? Are there any safeguards for this?

I think you need to think more about exactly what you mean in order for the question to be meaningful. What kind of order relationship do you expect to have between the stores and the signal? Certainly you're not going to get any stronger guarantees than what you'd get about what you would see from a signal handler interrupting the operations on the memory mapped file. — R.. GitHub STOP HELPING ICE, Jul 19 '18 at 22:55
I don't full understand your comment. Can you clarify? Are you implying that the SIGKILL operation would wait for the memory mapped operation to finish? — ajoseps, Jul 19 '18 at 23:03
The process which mmap'ed the file is not the process which is actually writing memory buffers to a physical disk. The kernel does that, at its own rhythm, and the SIGKILL does not get directed to the kernel. — rici, Jul 19 '18 at 23:05
sorry i should clarify, the post i linked talks about the buffer being flushed to disk. What I am asking about is the write to the memory buffer. — ajoseps, Jul 19 '18 at 23:10
There is no write to the memory buffer. The memory buffer is directly mapped to the process address space. Whatever is in the address space when the process is killed is what is there. (That's what the `map` in `memory mapped` is all about.) — rici, Jul 19 '18 at 23:56
I think my terminology is faulty here. My intent was what is in address space being changed. e.g. if a 64bit int is written 32 bits at a time but a SIGKILL occurs between the first 32 bits and the second, then the resultant memory mapped file would only contain the first 32bits. To guarantee that the full int is written then it would need to be declared atomic. Is that correct? — ajoseps, Jul 20 '18 at 00:12
@ajoseps: In Linux, the best you could do thenis use `__atomic_store_n((uint64_t *)ptr, (uint64_t)value, __ATOMIC_SEQ_CST)`. (On architectures without 64-bit atomic type, that could use an internal mutex, I believe. But on architectures with 64-bit atomic store, you should only see either the old 64-bit value, or new 64-bit value, and never a mix, even if the process was killed right during that call.) — Nominal Animal, Jul 20 '18 at 00:15
If you need that level of persistent data coherence, mmap is probably not appropriate. It will be difficult to provide data coherence guarantees without a lot more control over data persistence. A write() operation is not 100% atomic either -- the disk could fail in the middle of a write, or mains power could be lost, etc. -- but a good file system will try to give you an all-or-nothing guarantee on the write of a block. Mmap does not even attempt this. — rici, Jul 20 '18 at 00:30

Nominal Animal · Accepted Answer · 2018-07-20T03:25:37.947

Let's say you have something like

volatile unsigned char  *map; /* memory-mapped file */
size_t                   i;

for (i = 0; i < 1000; i++)
    map[i] = slow_calculation(i);

and for some reason, the process gets killed when i = 502.

In such a case, the contents of the file will indeed reflect the content of the mapping at that point.

No, there is no way to avoid this (with regards to the KILL signal), because KILL is unblockable and uncatchable.

You can minimize the window by using a temporary buffer as a "transactional" buffer, calculating the new values to that buffer, and then just copy the values over. It is no guarantee, but it does mean there is a much higher probability that the file contents are intact even if the process is killed. (Furthermore, it means that if you use e.g. mutexes to synchronize access to the mapping, you only need to hold the mutex for the minimum amount of time.)

Killing a process via the KILL signal is very abnormal termination, and having memory-mapped files garbled because of that is, in my opinion, expected. It is not something that should be done during normal operation at all; the TERM signal is used for that.

What you should worry about, is that your process responds to a TERM signal in a timely fashion. TERM is catchable and blockable, and is basically a way for an external supervisor process (or user the process belongs to, or the superuser) to request the process exit cleanly as soon as possible. However, the process should not dally around, because it is quite common to send the process a KILL signal, if it doesn't exit within a few seconds after receiving a TERM signal.

In my own daemons, I strive for them to respond to a TERM within a second or so, unless the system is under a heavy load. It is, of course, a very subjective measurement since the speed of different systems varies, but there are no hard and fast rules here.

One way to handle this, is to install a TERM signal handler that in normal operation, does terminate the process immediately. For critical sections, the exit is postponed:

static volatile int  in_critical = 0;
static volatile int  need_to_exit = 0;

static void handle_exit_signal(int signum)
{
    __atomic_store_n(&need_to_exit, 1, __ATOMIC_SEQ_CST);
    if (!__atomic_load_n(&in_critical, __ATOMIC_SEQ_CST))
        exit(126);
}

static int install_exit(int signum)
{
    struct sigaction  act;
    memset(&act, 0, sizeof act);
    sigemptyset(&act.sa_mask);
    act.sa_handler = handle_exit_signal;
    act.sa_flags = SA_RESTART;
    if (sigaction(signum, &act, NULL) == -1)
        return errno;
    return 0;
}

To enter and exit critical sections (say, when you hold a mutex within the shared memory region):

static inline void critical_begin(void)
{
    __atomic_add_fetch(&in_critical, 1, __ATOMIC_SEQ_CST);
}

static inline void critical_end(void)
{
    if (!__atomic_sub_fetch(&in_critical, 1, __ATOMIC_SEQ_CST))
        if (__atomic_load_n(&need_to_exit, __ATOMIC_SEQ_CST))
            exit(126);
}

So, if a TERM signal is received while you are in a critical section (and critical_begin() and critical_end() do nest), the final call to critical_end() exits the process.

Note that I used the GCC atomic built-ins for managing the flags atomically, without data races, even if the signal handler is executed in a different thread. I've found this the cleanest solution for linux, although it should work on other OSes too. (Other C compilers you can use in Linux, like clang and Intel CC, do support those, too.)

So, in pseudocode, doing the slow 1000-element calculation as shown in the beginning, would then be

volatile unsigned char  *map;
unsigned char            cache[1000];
size_t                   i;

/* Nothing critical yet, we're just calculating new values... */
for (i = 0; i < 1000; i++)
    cache[i] = slow_calculation(i);

/* Update shared memory map. */
critical_begin();
/* pthread_mutex_lock() */
memcpy(map, cache, 1000);
/* pthread_mutex_unlock() */
critical_end();

If a TERM signal is delivered before the critical_begin(), the process is terminated then and there. If a TERM signal is delivered after that, but before the critical_end(), the call to critical_end() will terminate the process.

This is just one pattern that can solve the underlying problem; there are others. The one with a single volatile sig_atomic_t done = 0; that the signal handler sets to nonzero, and the main processing loops check regularly, is even more common.

As pointed out by R.. in a comment, the pointer used to refer to the memory map should be a pointer to volatile (i.e., volatile some_type *map) to stop the compiler from reordering the stores to the memory map.

In your first example at the top of this answer, `map` is not a pointer-to-volatile, so there is no ordering between the stores. There's no reason to expect that, when `map[502]` is assigned, the stores for any of `map[0]` to `map[501]` already occurred. — R.. GitHub STOP HELPING ICE, Jul 20 '18 at 01:29
@R..: True; edited and added a note. I wonder if I should expand a bit, and explain that one does not see that often in existing code, especially code using POSIX threads, since the locking primitives (`pthread_mutex_lock()`, `pthread_mutex_unlock()`) or compiler-provided atomic accessor built-in functions (when using e.g. generation counters) provide the ordering? — Nominal Animal, Jul 20 '18 at 03:29
Yeah. I think OP is missing that the entire concern here is not kernel failing to commit writes via mmap (a non-issue) but inherent concurrency/memory order issues with shared memory. — R.. GitHub STOP HELPING ICE, Jul 20 '18 at 05:04

Does the OS (POSIX) finish a modification to a memory-mapped file if the process is SIGKILLed?

1 Answers1