59

Where can I find a well-respected reference that details the proper handling of PID files on Unix?

On Unix operating systems, it is common practice to “lock” a program (often a daemon) by use of a special lock file: the PID file.

This is a file in a predictable location, often ‘/var/run/foo.pid’. The program is supposed to check when it starts up whether the PID file exists and, if the file does exist, exit with an error. So it's a kind of advisory, collaborative locking mechanism.

The file contains a single line of text, being the numeric process ID (hence the name “PID file”) of the process that currently holds the lock; this allows an easy way to automate sending a signal to the process that holds the lock.

What I can't find is a good reference on expected or “best practice” behaviour for handling PID files. There are various nuances: how to actually lock the file (don't bother? use the kernel? what about platform incompatibilities?), handling stale locks (silently delete them? when to check?), when exactly to acquire and release the lock, and so forth.

Where can I find a respected, most-authoritative reference (ideally on the level of W. Richard Stevens) for this small topic?

bignose
  • 30,281
  • 14
  • 77
  • 110

5 Answers5

23

First off, on all modern UNIXes /var/run does not persist across reboots.

The general method of handling the PID file is to create it during initialization and delete it from any exit, either normal or signal handler.

There are two canonical ways to atomically create/check for the file. The main one these days is to open it with the O_EXCL flag: if the file already exists, the call fails. The old way (mandatory on systems without O_EXCL) is to create it with a random name and link to it. The link will fail if the target exists.

bignose
  • 30,281
  • 14
  • 77
  • 110
Joshua
  • 40,822
  • 8
  • 72
  • 132
  • 5
    “There are two canonical ways to atomically create/check for the file.” That's exactly the kind of thing my question is about: where is this canon recorded canonically, and what makes it authoritative compared to conflicting advice from others? – bignose May 09 '10 at 06:05
  • Unfortunately much of UNIX operation methods is handed down in the culture. Reading the man pages for the system calls described in POSIX.1 (these are, confusingly enough, in man section 2) reveals only a few things that are suitable for locking. Since flock() isn't trusted this leaves only these two and one involving mkdir. – Joshua May 09 '10 at 23:37
  • 3
    A fair question is "Why isn't flock() trusted." The answer is there have been too many broken systems over the years, and it never works correctly over nfs anyway (the nfslock protocol itself is subject to the split mind problem). – Joshua May 09 '10 at 23:39
18

As far as I know, PID files are a convention rather than something that you can find a respected, mostly authoritative source for. The closest I could find is this section of the Filesystem Hierarchy Standard.

This Perl library might be helpful, since it looks like the author has at least given thought to some issues than can arise.

I believe that files under /var/run are often handled by the distro maintainers rather than daemons' authors, since it's the distro maintainers' responsibility to make sure that all of the init scripts play nice together. I checked Debian's and Fedora's developer documentation and couldn't find any detailed guidelines, but you might be able to get more info on their developers' mailing lists.

Josh Kelley
  • 56,064
  • 19
  • 146
  • 246
  • Thanks. The consensus in other forums also seems to be that there's no canonical reference for this. (The FHS mentions the file location and content briefly, and says nothing about behaviour.) – bignose Apr 27 '09 at 03:37
12

See Kerrisk's The Linux Programming Interface, section 55.6 "Running Just One Instance of a Program" which is based on the pidfile implementation in Stevens' Unix Network Programming, v2.

Note also that the location of the pidfile is usually something handled by the distro (via an init script), so a well written daemon will take a command line argument to specify the pidfile and not allow this to be accidentally overridden by a configuration file. It should also gracefully handle a stale pid file by itself (O_EXCL should not be used). fcntl() file locking should be used--you may assume that a daemon's pidfile is located on a local (non-NFS) filesystem.

John Hammond
  • 166
  • 1
  • 4
  • If there's information online for this, please turn “The Linux Programming Interface” and/or “section 55.6” into anchor text linking to where we can read more. – bignose Dec 11 '12 at 03:34
7

Depending on the distribution, its actually the init script that handles the pidfile. It checks for existence at starting, removes when stopping, etc. I don't like doing it that way. I write my own init scripts and don't typically use the stanard init functions.

A well written program (daemon) will have some kind of configuration file saying where this pidfile (if any) should be written. It will also take care to establish signal handlers so that the PID file is cleaned up on normal, or abnormal exit, whenever a signal can be handled. The PID file then gives the init script the correct PID so it can be stopped.

Therefore, if the pidfile already exists when starting, its a very good indicator to the program that it previously crashed and should do some kind of recovery effort (if applicable). You kind of shoot that logic in the foot if you have the init script itself checking for the existence of the PID, or unlinking it.

As far as the name space, it should follow the program name. If you are starting 'foo-daemon', it would be foo-daemon.pid

You should also explore /var/lock/subsys, however that's used mostly on Red Hat flavors.

Magne
  • 16,401
  • 10
  • 68
  • 88
Tim Post
  • 33,371
  • 15
  • 110
  • 174
1

The systemd package on Red Hat 7 provides a man page daemon(7) with the header line "Writing and packaging system daemons."

This man page discusses both "old style" (SysV) and "new style" (systemd) daemonization. In new style, systemd itself handles the PID files for you (if so configured to do so). However, in old style, the man page has this to say:

  1. In the daemon process, write the daemon PID (as returned by getpid()) to a PID file, for example /run/foobar.pid (for a hypothetical daemon "foobar") to ensure that the daemon cannot be started more than once. This must be implemented in race-free fashion so that the PID file is only updated when it is verified at the same time that the PID previously stored in the PID file no longer exists or belongs to a foreign process.

You can also read this man page online.

Wildcard
  • 1,302
  • 2
  • 20
  • 42