1

What is the best approach to implement filter by process name from a user mode application under Linux?

All methods that I am aware of rely on reading proc_fs:

  1. readlink on /proc/$PID/exe
  2. reading from /proc/$PID/cmdline, until the first null character
  3. parsing the Name field in /proc/$PID/status

The first method seems to be reliable, if combined with method #3. Unfortunately, the path gets a (deleted) suffix when the executable is removed from the system, which can be a suffix part of an ordinary file name. The filter can not be robust if such names are used for executables.

The second method is dependent on the shell that started the process. This is just the first (position 0) argument of the process, and IIUC, shells are free to set it in anyway they see fit. For example, bash prepends dash to login shells.

The third method relies on a name truncated to 15 characters, as taken directly from a field in the kernel task_struct. This is obviously not robust, but is the only name available for kernel processes, and thus must supplement the other two. (Apparently, if the name contains non-ASCII characters they appear as escape sequences, so the method is reliable in this way.)

Altogether, I can not come up with a robust, shell-independent way, to support filtering by process executable name (or ideally path), allowing arbitrary file names. I will probably resort to the leading command parameter in cmdline, since it may fit my purposes, but I would like to make sure that I understand the available options.

Note: Security, although an issue, is a different point. Checking the user identity of the process will be done if security is necessary. But what I desire for the name filter is just correctness. The aim is to implement a quality of service or per-process configuration reliably, and process name filtering will be involved.

simeonz
  • 412
  • 5
  • 14
  • duplicate of : http://stackoverflow.com/questions/15545341/process-name-from-its-pid-in-linux ? – Mali Dec 02 '13 at 13:37
  • There are a few questions with the same general topic (like http://stackoverflow.com/questions/1023306/finding-current-executables-path-without-proc-self-exe), but the issue of stringent robustness has not been addressed. At least not that I know of. – simeonz Dec 02 '13 at 13:42

1 Answers1

2

The robustness of the first method (readlink /proc/$PID/exe) can be improved by doing a pair of stats on the link itself and the result of the readlink. If you get a matching st_dev and st_ino, they're the same file. If you don't get a match, or get an ENOENT, then check for " (deleted)" at the end of the string, strip it off and try again. Repeat until you get a match or run out of " (deleted)" instances.

If you don't get a match after all that, the executable file really has been deleted. (And you haven't really specified what you want to do in that case - which you should definitely think about. When you are insisting on robustness, you can't just ignore the fact that deleted files can be in use!)

There's still a race condition between the stats, so you might want to open both files and fstat them instead. Then if you get a device+inode match, you have a file descriptor that can be used with confidence that it actually belongs to the file that was exec'd in the target process, not some other file with a similar name.

The next difficulty is if the process itself goes away during your test, and the PID gets reused. If you care about that, you can read the process start time from /proc/$PID/stat at the beginning and end of the operation, to make sure you were dealing with the same process the whole way through. (Also, there's a way to keep a process from going away: attach to it as a debugger with ptrace.)

Then there's the question fo what you want to do if the process execs a different program while you're looking at it. /proc/$PID/exe will change. If it happens right after your final consistency check, you will return a value that was correct, but isn't anymore. You can't do much about that, except the ptrace, and that's more intrusive than you probably want.

  • Amazing. I confess that I will have to rethink the performance hit from doing all the above on a regular basis, but I could not have hoped for such exhaustive answer. A slight remark, about having a small probability of inode getting reused from a file with almost the same name, but with a "(deleted)" suffix. Still, good answer, and very complicated procedure to get very generic piece of information. – simeonz Dec 02 '13 at 14:52
  • I think you're making a mistake in thinking that a process has a "name" as a fundamental property that you should be able to easily get. The name of the last file `execve`'d by the process, at the time of the `execve`, is good enough for human consumption, but not something programs should rely on. –  Dec 02 '13 at 15:24
  • Yes, you're right. I understand that I will have to think further about solutions. Still, I argue, that as a property, the executable filename of the currently hosted application in a process is something that is interesting to obtain programmatically. The filename does not identify the application, but it is some information for the user or administrator to filter on. Better than nothing. At least I can not think of another user-friendly way for users to filter identity. (Say, hashing or elf versioning is better, but the administrator will have to keep track of those between updates.) – simeonz Dec 02 '13 at 15:52
  • Note stripping " (deleted)" is not sufficient if the file is renamed first (like rpm does for example) – pixelbeat Aug 07 '15 at 15:26