Context:
I'm academically interested in tracking/identifying UNIX processes in a way that is proof against PID wraparound. To start tracking a process by PID, I need to be able to conclusively identify it on the system.
Thus, I need a function, get_identity
, that takes a PID, and only returns once it has determined a system-wide unique identity for that PID. The function should work on all or most POSIX-compliant systems.
The only immutable values in the process table that I know of are PID and start time. However, the following scenario poses a problem:
- User calls
get_identity(pid)
get_identity
reads the start time in seconds-since-the-epoch ofpid
, if it exists, and returns the hopefully-unique tuple[pid, starttime]
(this is what the excellentpsutil
Python library considers "unique enough", so it should be pretty robust).- Within a second of that call, PID wraparound occurs on the system, and
pid
is recycled. - The
[pid, starttime]
tuple now refers to a different process than was present at the call toget_identity
.
While it is extremely improbable for PID wraparound to occur and re-use the selected PID within a second of its being identified, it is not impossible . . . right?
Questions:
- Is there a guarantee on UNIX/POSIX-compliant systems that the start time of a PID will be different between wraparound-caused re-uses of that same PID value?
- If not, how can I uniquely identify a process on a wraparound-prone system?
What I've Tried:
- I can simply
sleep
for a second after examining the target process. If the start-time-in-seconds is the same after thesleep
, then it's either the same process that I started watching, or the PID has wrapped around to a different one but the system cannot tell the difference. If the start time has changed, I can return an error, or start over. However, this requires my identification function to wait for up to 1 second before returning, which is not ideal. times()
returns values in clock ticks, which I can convert to seconds. Assuming that the starttime-in-seconds of a process is based on the same clock thattimes
uses, and assuming that all UNIXes use the same rounding logic to convert fromclock ticks -> fractional seconds -> whole seconds
, I could theoretically use this information to reduce the duration of thesleep
in the above workaround to the time until the next "full second boundary according to the process table". However, the worst-case sleep time would still be nearly 1 second, so this is not ideal.- On Linux, I can get the starttime in jiffies (or CPU ticks, for old Linuxes) from the
/proc/$pid/stat
file. With that information, my program could wait one jiffy(ie?), check the starttime again, and, if it was the same, determine identity. This correctly solves my problem (1 jiffy + overhead is a fast enough runtime), but only on Linux; other UNIX platforms may not have/proc
. On BSD, that information is available via thekvm
subsystem or viasysctl
s. On other Unixes . . . who knows? I'd need to develop multiple platform-specific implementations to gather this data--something I'd prefer to avoid.