0

Summary

I tried to code a Monte Carlo simulation that forks into up to number of cores processes. After a certain amount of time the parent sends SIGUSR1 to all children which then should stop calculating an send results back to the parent.

When I compile without any optimization (clang thread_stop.c) the behavior is as expected. When I try to optimize the code (clang -O1 thread_stop.c) the signals are caught, but the children do not stop.

Code

I cut the code down to the smallest piece which behaves the same:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <sys/types.h>  /* pid_t */
#include <sys/mman.h>   /* mmap */

#define MAX 1           /* Max time to run */

static int a=0; /* int to be changed when signal arrives */

void sig_handler(int signo) {
    if (signo == SIGUSR1){
        a=1;
        printf("signal caught\n");
    }
}

int main(void){

    int * comm;
    pid_t pid;

    /* map to allow child processes access same array */
    comm = mmap(NULL, sizeof(int), PROT_READ | PROT_WRITE,
                    MAP_SHARED | MAP_ANONYMOUS, -1, 0);
    *comm = 0;
    pid=fork();
    if(pid == 0){ /* child process */
        signal(SIGUSR1, sig_handler); /* catch signal */ 

        do {
            /* do things */
        } while(a == 0);

        printf("Child exit(0)\n");
        *comm = 2;
        exit(0); /* exit for child process */
    } /* if(pid == 0) - code below is parent only */

    printf("Started child process, sleeping %d seconds\n", MAX);
    sleep(MAX);
    printf("Send signal to child\n");
    kill(pid, SIGUSR1); /* send SIGUSR1 */
    while(*comm != 2) usleep(10000);
    printf("Child process ended\n");

/* clean up */

    munmap(comm, sizeof(int));
    return 0;
}

System

clang shows this on termux (clang 9.0.1) and lubuntu (clang 6.0.0-lubuntu2).

1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
Florian
  • 33
  • 4
  • 6
    I think you need to use a [`volatile sig_atomic_t`](https://stackoverflow.com/q/8488791/10077) instead of a `static int`. – Fred Larson Jan 31 '20 at 21:44
  • 3
    Also note that `printf` is not a [signal safe function](http://man7.org/linux/man-pages/man7/signal-safety.7.html). – Fred Larson Jan 31 '20 at 21:50
  • If your code is correct, optimization won't alter the results, but may alter the speed with which the results are obtained. If your code is incorrect, the optimizer may make decisions based on undefined behaviour that change the result, but there is no problem as far as the compiler is concerned because with undefined behaviour, any result is valid. I've not looked at your code to spot where there is undefined behaviour, but if optimization changes the results, it is probable that there is a problem with undefined behaviour. – Jonathan Leffler Jan 31 '20 at 22:44
  • https://en.cppreference.com/w/c/program/sig_atomic_t – jxh Jan 31 '20 at 22:53
  • You should enable some warnings while you're at it (`clang -Wall -Wextra -std=c11 -pedantic -D_POSIX_C_SOURCE=200809L`) – S.S. Anne Jan 31 '20 at 23:28
  • @JonathanLeffler if the code relies on unspecified behaviour then optimization may alter the results – M.M Feb 01 '20 at 00:24
  • 1
    @M.M — isn't that what I said? – Jonathan Leffler Feb 01 '20 at 00:25
  • @JonathanLeffler You seemed to be saying that altered results implied undefined behaviour – M.M Feb 01 '20 at 00:26
  • @M.M: Oh — you're drawing deeper significance to 'undefined behaviour' than I intended (but my bad; I wasn't thinking hard enough). I meant "if the standard doesn't specify what the behaviour should be", so 'unspecified behaviour' and 'undefined behaviour' are both included — and if the implementation defines different behaviour depending on what optimization levels are specified, I suppose 'implementation-defined' behaviour also affects the results. I don't think 'locale-specific behaviour' could legitimately change under optimization. – Jonathan Leffler Feb 01 '20 at 00:29

2 Answers2

1

There are restrictions on what you can do in a signal handler that is called asynchronously. In your code this happens because kill is called from a separate process.

In ISO C the only permitted observable action is to modify a variable of type sig_atomic_t .

In POSIX there is a bit more leniency:

the behavior is undefined if the signal handler refers to any object other than errno with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t, or if the signal handler calls any function defined in this standard other than one of the functions listed in the following table.

The following table defines a set of functions that shall be async-signal-safe. Therefore, applications can call them, without restriction, from signal-catching functions. Note that, although there is no restriction on the calls themselves, for certain functions there are restrictions on subsequent behavior after the function is called from a signal-catching function (see longjmp).

The printf function is not in the table, so your program causes undefined behaviour when the signal is executed (which means unexpected results may follow).


So you will need to stop calling printf in the signal handler, and also change a to have type volatile sig_atomic_t.

There is also a race condition on the memory location *comm. One thread reads it while another may simultaneously write it, with no synchronization. However I haven't been able to find in the POSIX documentation what the consequences of this are.

M.M
  • 138,810
  • 21
  • 208
  • 365
0

Changing to volatile sig_atomic_t cured. Thanks for the fast help.

Florian
  • 33
  • 4