8

Context:

I have a linux[1] system that manages a series of third party daemon's with which interactions are limited to shell[2] init scripts, i.e. only {start|restart|stop|status} are available.

Problem:

Processes can assume the PID of a previously running process, the status of processes are checked by inspecting the presence of a running processes with it's PID.

Example:

Process A run's with PID 123, subsequently dies, process B initialises with PID 123 and the status command responds with an unauthentic (erroneous) "OK". In other words, we only check for the presence of a process from its PID to validate that the process is running, we assume that should a process with this PID exist, it is the process in question.

Proposed solutions:

  1. Interrogate the process, using the PID, to ensure the command/daemon running as that PID is as expected. The problem with this solution is that both the command and PID need to match; multiple bits of information thus need to be maintained and kept in sync, and add addition complexity to error/edge conditions.
  2. Correlate the creation time of the PID file with the start time of the process, if the process is within a certain delta of the PID file creation time, we can be fairly certain that the command/daemon running is as expected.

Is there a standard way to ratify the authenticity of a process/PID file, beyond presence of a process running with that PID? I.e. I (as the system) want to know if you (the process) are running and if you are who I think you are (A and not B).

Assuming we have elected to implement the second solution proposed above, what confidence interval/delta between the PID creation time and process start time is reasonable? Here, reasonable means acceptable compromise between type 1 / type 2 errors.

[1] CentOS/RHEL [2] Bash

Gary
  • 81
  • 3
  • 1
    Shouldn't this be on [ServerFault](http://serverfault.com/)? – Graham Sep 07 '12 at 13:13
  • Can you make any changes to the third party daemons themselves? If so, you can use `flock` to create some file system locks for the daemons. – Grisha Levit Sep 07 '12 at 15:58
  • 2
    Are you sure that process ids are reused at once? I know that is the case on Windows, but I have not observed that on Linux or UNIX. See http://stackoverflow.com/questions/3446727/how-does-linux-determine-the-next-pid – cdarke Sep 07 '12 at 16:56
  • @cdarke there are never multiple instances of the same PIDs, the issue is that once a process dies, its PID may be reused. At that point, the existence of a PID file, which has been orphaned due to the exceptional circumstance that killed the process, is used to determine whether the process is still running. Here, everything seems peachy (the process is running) but its not actually the process we were hoping to find. – Gary Sep 12 '12 at 12:14
  • @Gary: yes, but my point was the PID is not reused at once (except on Windows). It is possible that an old PID file could be left from a previous run if there is no tidy-up operation. Obviously using the PID file to determine if the process is still running is flawed design. – cdarke Sep 12 '12 at 13:21

3 Answers3

5

The content of the file:

/proc/{PID}/cmdline

is the command line used to start the process. Is that what you need?

Benoit Thiery
  • 6,325
  • 4
  • 22
  • 28
  • This was considered in proposed solution 1: it still requires me to keep a duplicate of the pid & command when ratifying the process. Keeping both bits of information, while plausible, adds additional complexity. – Gary Sep 07 '12 at 13:05
  • Gary, do you want "fairly certain" or "certain" results? If estimations and approximate results are good enough (and only YOU can be the judge of that), then try implementing your second solution, and if you have problems with your code, post them to StackOverflow. This is a Q&A site for programming, not system administration best practies. In the mean time, consider switching to [Daemontools](http://cr.yp.to/daemontools.html) instead of launching things using init scripts. – ghoti Sep 07 '12 at 13:12
  • Thank you for the suggestion, ghoti. I have functioning renditions of both proposed solutions; I am trying to determine if there exists a recommended/standard approach to solving this issue. – Gary Sep 07 '12 at 13:21
0

My solution was to capture the command (via /proc/PID/cmdline) along with the relative start time. Using the absolute start time (via ps -p PID -o lstart=) might appear to work, but you'll get confusing results if your system clock changes (e.g. from an NTP update, or Daylight Savings).

Here's my implementation:

# Prints enough detail to confirm a PID still refers to the same process.
# In other words, even if a PID is recycled by a call to the same process the
# output of this command should still be different. This is not guaranteed
# across reboots.
proc_detail() {
  local pid=${1:?Must specify PID}
  # the process' commandline, if it's running
  # ensures a non-existant PID will never have the same output as a running
  # process, and helps debugging
  cat "/proc/$pid/cmdline" 2> /dev/null && echo
  # this is the number of seconds after boot that the process started
  # https://unix.stackexchange.com/a/274722/19157
  # in theory this could collide if the same process were restarted in the same
  # second and assigned the same PID, but PIDs are assigned in order so this
  # seems acceptably unlikely for now.
  echo "$(($(cut -d. -f1 < /proc/uptime) - \
           $(ps -p "$pid" -o etimes= 2> /dev/null || echo "0")))"
}

I also decided to store this output in /dev/shm so that it's cleared automatically for me on shutdown. There are other viable options (such as a @reboot cronjob) but for my use case writing to a tmpfs was easy and clean.

Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
0

I was looking for an answer to the question How do I ensure that a process is still the same process and the two solutions from the question came to my mind, namely whether a process can be uniquely identified by the tuple (pid, command) or (pid, process start time). But sadly both option seem not to suffice.

  1. (pid, command) does not suffice because of pid reuse, e.g., the original process might have already been killed and with the pid free for reuse, another process with the same command line might have been started using that pid.

  2. (pid, process start time) seems to have problems with the starting time changing sometimes by small amounts.

Now, another option comes from being able to change the process title, e.g., we can put a random number into your process title and store the random number together with the pid in a pidfile. Then when we want check whether the process is still the same one, for example to kill it, we can check whether the title of the process for the pid of the pid file still starts with the random number that is also in the pid file.

For illustration consider this short python snippet, similar functionality should be available via libraries for other languages:

#!/usr/bin/env python3
import os, setproctitle
nonce = bytes.hex(os.urandom(8))                      # create hex nonce
setproctitle.setproctitle(nonce + " " + setproctitle.getproctitle()) # set title
with open("run.pid", "w"): f.write(pid + " " + nonce) # store pid and nonce in pidfile

Together with this shell script to kill the process, if it is still the same.

#!/bin/sh
PID=$(cat run.pid | cut -f1 -d" ")     # get pid from pidfile
NONCE1=$(cat run.pid | cut -f2- -d" ") # get nonce from pidfile
NONCE2="$(ps -p "$PID" -o command= 2> /dev/null | cut -f1 -d" ")" # get nonce from process title
if [ "$NONCE1" = "$NONCE2" ]; then     # if nonces equal
  kill "$PID"                          # kill process
  echo "killed"
else                                   # otherwise the process you wanted to kill
  echo "was already dead"              # has been dead anyway
fi
drcicero
  • 151
  • 2
  • 6