5

Context:

I'm academically interested in tracking/identifying UNIX processes in a way that is proof against PID wraparound. To start tracking a process by PID, I need to be able to conclusively identify it on the system.

Thus, I need a function, get_identity, that takes a PID, and only returns once it has determined a system-wide unique identity for that PID. The function should work on all or most POSIX-compliant systems.

The only immutable values in the process table that I know of are PID and start time. However, the following scenario poses a problem:

  1. User calls get_identity(pid)
  2. get_identity reads the start time in seconds-since-the-epoch of pid, if it exists, and returns the hopefully-unique tuple [pid, starttime] (this is what the excellent psutil Python library considers "unique enough", so it should be pretty robust).
  3. Within a second of that call, PID wraparound occurs on the system, and pid is recycled.
  4. The [pid, starttime] tuple now refers to a different process than was present at the call to get_identity.

While it is extremely improbable for PID wraparound to occur and re-use the selected PID within a second of its being identified, it is not impossible . . . right?

Questions:

  • Is there a guarantee on UNIX/POSIX-compliant systems that the start time of a PID will be different between wraparound-caused re-uses of that same PID value?
  • If not, how can I uniquely identify a process on a wraparound-prone system?

What I've Tried:

  • I can simply sleep for a second after examining the target process. If the start-time-in-seconds is the same after the sleep, then it's either the same process that I started watching, or the PID has wrapped around to a different one but the system cannot tell the difference. If the start time has changed, I can return an error, or start over. However, this requires my identification function to wait for up to 1 second before returning, which is not ideal.
  • times() returns values in clock ticks, which I can convert to seconds. Assuming that the starttime-in-seconds of a process is based on the same clock that times uses, and assuming that all UNIXes use the same rounding logic to convert from clock ticks -> fractional seconds -> whole seconds, I could theoretically use this information to reduce the duration of the sleep in the above workaround to the time until the next "full second boundary according to the process table". However, the worst-case sleep time would still be nearly 1 second, so this is not ideal.
  • On Linux, I can get the starttime in jiffies (or CPU ticks, for old Linuxes) from the /proc/$pid/stat file. With that information, my program could wait one jiffy(ie?), check the starttime again, and, if it was the same, determine identity. This correctly solves my problem (1 jiffy + overhead is a fast enough runtime), but only on Linux; other UNIX platforms may not have /proc. On BSD, that information is available via the kvm subsystem or via sysctls. On other Unixes . . . who knows? I'd need to develop multiple platform-specific implementations to gather this data--something I'd prefer to avoid.
Community
  • 1
  • 1
Zac B
  • 3,796
  • 3
  • 35
  • 52
  • 1
    I think you're hosed, frankly. If processes are being spawned at a rate of more than 2^16 per second, I would absolutely expect that PID+starttime collisions are possible. The kernel people would have to have had a concrete reason to distort starttime values to make PID+starttime unique, and I would expect that starttime accuracy was felt to be more important. – zwol Sep 14 '16 at 16:04
  • 2
    @zwol Not disagreeing with your overall conclusion, but keep in mind that PID-wraparound can occur far more frequently than 2^16 per second. Granted, it's still quite unlikely, but consider a system that already has 2^16-4 long-running processes - then PID-wraparound will occur every 4 forks... – twalberg Sep 14 '16 at 18:55
  • @twalberg Oh, good point, that didn't even occur to me. – zwol Sep 14 '16 at 18:56
  • 1
    You should look into the [Proc connector](http://netsplit.com/the-proc-connector-and-socket-filters). With that you can subscribe to all process events happening on the system, including `fork()` and `exec()`. – NovaDenizen Sep 14 '16 at 19:22
  • The proc connector looks super-useful, thanks! It's unfortunately Linux-only, though. Know of anything cross-platform like that? – Zac B Sep 15 '16 at 03:31

2 Answers2

1

Since the assignment of PIDs and proc table management in general is not defined by any standard it's literally impossible to do what you want in a portable way.

You will need to do as you say and develop multiple platform-specific implementations to gather enough information about a process to determine a unique identity for every process.

On the other hand if you don't need this information in real time as the processes are started and while they are still running you can, on most unix-y systems, simply turn on process accounting and have a guaranteed unique and complete record of every process that has been run by the system. Process accounting files are not standardized either, but there will be header files defining their record format, and there should be tools on each type of system which can process and summarize accounting files in various ways.

Greg A. Woods
  • 2,663
  • 29
  • 26
-1

PID wrap around is guaranteed. You'll never get two processes with the same pid.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
  • A comment for the downvote would be appreciated, mostly by the fact that different unices have different ways to ensure it and it's something that not only must be ensured, but ensured efficiently. – Luis Colorado Oct 03 '16 at 11:09
  • This does not answer the question, which is about the effects of PID wraparound on process start time. PID uniqueness is not in question; *process* uniqueness is what I'm trying to get better at identifying. – Zac B Oct 18 '16 at 17:07