3

I'm looking into some code that essentially implements a multithreaded daemon that also calls fork and I'm convinced that it's not doing so safely. A rewrite is the ideal scenario, but I'm also investigating the best way to modify it make it safe, the idea is pretty simple:

  • Create a read-write lock
  • Grab the write lock before forking
  • Grab the read lock before doing anything "unsafe" (which might grab a lock etc...)

I can find out what's unsafe in our own code, but I have no clue what's unsafe in system code. To that end, I'm wondering if there is an exhaustive list somewhere of standard libc and syscalls which implicitly grab mutexes under the hood.

There is a list of system calls in the signal(2) manpage which are safe to call in signal handlers:

       _Exit()  _exit()  abort()  accept()  access()  aio_error() aio_return()
       aio_suspend() alarm() bind() cfgetispeed() cfgetospeed()  cfsetispeed()
       cfsetospeed() chdir() chmod() chown() clock_gettime() close() connect()
       creat() dup() dup2() execle() execve() fchmod() fchown() fcntl() fdata-
       sync()   fork()   fpathconf()  fstat()  fsync()  ftruncate()  getegid()
       geteuid() getgid() getgroups() getpeername() getpgrp()  getpid()  getp-
       pid()   getsockname()  getsockopt()  getuid()  kill()  link()  listen()
       lseek() lstat()  mkdir()  mkfifo()  open()  pathconf()  pause()  pipe()
       poll()  posix_trace_event()  pselect() raise() read() readlink() recv()
       recvfrom()  recvmsg()  rename()  rmdir()  select()  sem_post()   send()
       sendmsg()  sendto()  setgid()  setpgid() setsid() setsockopt() setuid()
       shutdown()  sigaction()  sigaddset()  sigdelset()  sigemptyset()   sig-
       fillset()  sigismember() signal() sigpause() sigpending() sigprocmask()
       sigqueue() sigset() sigsuspend() sleep() socket()  socketpair()  stat()
       symlink()  sysconf()  tcdrain()  tcflow() tcflush() tcgetattr() tcgetp-
       grp() tcsendbreak() tcsetattr() tcsetpgrp()  time()  timer_getoverrun()
       timer_gettime()   timer_settime()   times()  umask()  uname()  unlink()
       utime() wait() waitpid() write()

And I presume that since these are safe to call in a signal handler, they won't try to grab any mutexes or do something non-reentrant.

Am I just to assume that all other system calls are unsafe? What about libc, I know from other threads that malloc for example does some locking under the hood, is there a definitive list somewhere?

Edit: Provide some background to why I'm asking this question.

Community
  • 1
  • 1
Squirrel
  • 2,262
  • 2
  • 18
  • 29
  • 4
    Why do you care if a function uses a mutex internally? To give an example, that doesn't automatically make a function non-reentrant. This makes me think that there might be an [XY Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) going on here. – NPE Jan 14 '15 at 15:58
  • Fair enough, I can see how it can look that way, I'll add some background to the question. The short answer is that I'm investigating interactions between forking and threading, typically a no-no. In this case it's about making some legacy code safer. – Squirrel Jan 14 '15 at 16:54
  • Without even pseudo-codewe cannot tell you what's going on... Becareful about locks, some locks are "shared" among threads others among processes. Grabbing before a fork is somehow strange... You should lock before any operation that need some grant before running, but forking is probably not such an operation... – Jean-Baptiste Yunès Jan 14 '15 at 17:53
  • It does matter if functions use locks internally. http://brooker.co.za/blog/2014/12/06/random.html – i_am_jorf Jan 14 '15 at 18:24
  • Not only that, but if you're calling `fork()` from Thread1 while `rand()` is holding a lock in Thread2, then in some implementations the lock will be held in the child process with no way of releasing it. If you then call `rand()` in the child your child will freeze -- hence the question. – Squirrel Jan 14 '15 at 19:03

1 Answers1

2

I take it that your concern is that if you call fork() from thread A, then thread B may be in a library function which holds a mutex. The new process will then have only a single thread running (a clone of thread A), and the mutex will never get dropped. If you call exec() immediately afterwards, you are safe, as the mutex (along with the rest of memory) will get wiped out. If you don't call exec() afterwards, then this is a valid concern. However, the library author should be aware of it, and should have coded around it by using pthread_atfork or similar.

From the documentation of pthread_atfork:

To understand the purpose of pthread_atfork, recall that fork(2) duplicates the whole memory space, including mutexes in their current locking state, but only the calling thread: other threads are not running in the child process. The mutexes are not usable after the fork and must be initialized with pthread_mutex_init in the child process. This is a limitation of the current implementation and might or might not be present in future versions.

So, provided the library has been properly written, you shouldn't need to worry about whether it has mutexes or not. Indeed some code that uses libraries doesn't even know that the library uses threads, and isn't itself linked to a threading library, so that has to be the case.

abligh
  • 24,573
  • 4
  • 47
  • 84
  • Thanks, indeed calling fork without an `exec()` is the exact concern. I've looked at `pthread_atfork`, and its how I'd implement the "fork wrapper" I described: essentially I'd need to make sure no other thread is in a library call in the prepare handler. For this the only solution I can think of is simply guarding all other threads' bodies with a read lock, which is fine, but will affect `fork()` performance since I'll have to wait for the other threads to release their locks. If I knew which libraries held mutexes I could just just guard them. – Squirrel Jan 14 '15 at 22:09
  • But the libraries *themselves* should be implementing `pthread_atfork`. – abligh Jan 14 '15 at 22:11
  • 1
    aaaaaah, I see what you mean now. So would you know if I can trust that everything in the GNU C Library (say) has done this properly? – Squirrel Jan 14 '15 at 22:52
  • ... to clarify, I think it works properly as I'd be surprised if the non-pthread related stuff in glibc used mutexes to any great extent if at all. – abligh Jan 15 '15 at 12:39
  • Thanks @abligh, since your answer did educate me about the subject, I've altered the phrasing so that I can accept your answer as a valid one. – Squirrel Jan 18 '15 at 22:42
  • 1
    @abligh glibc takes no measures to protect you from this. (which e.g. leads to bug reports such as [this](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=657835) and many like it.) – nos Jan 18 '15 at 22:52