If you can move the privileged part into a separate process, I warmly recommend doing so. The parent process will construct at least one Unix Domain socket pair, keeping one end for itself, and put the other end as the child process standard input or output.
The reason for using an Unix domain socket pair is that such a pair is not only bidirectional, but also supports identifying the process at the other end, and passing open file descriptors from one process to another.
For example, if your main process needs superuser access to read a file, perhaps in a specific directory, or otherwise identifiable, you can move the opening of such files into a separate helper program. By using an Unix domain socket pair for communication between the two, the helper program can use getsockopt(ufd, SOL_SOCKET, SO_PEERCRED, &ucred, &ucred_size) to obtain the peer credentials: process ID, effective user ID, and effective group ID. Using readlink() on the pseudofile /proc/PID/exe
(where PID
is the process ID as a positive decimal number) you can obtain the executable the other end is currently running.
If the target file/device can be opened, then the helper can pass the open file descriptor back to the parent process. (Access checks in Linux are only done when the file descriptor is opened. Read accesses will only be blocked later if the descriptor was opened write-only or the socket read end has been shut down, and write accesses only blocked if the descriptor was opened read-only or the socket write end has been shut down.)
I recommend passing an int
as the data, which is 0
if successful with the descriptor as an ancillary message, and an errno
error code otherwise (without ancillary data).
However, it is important to consider the possible ways how such helpers might be exploited. Limiting to a specific directory, or perhaps having a system-wide configuration file that specifies allowed path glob patterns (and not writable by everyone), and using e.g. fnmatch() to check if the passed path is listed, are good approaches.
The helper process can gain privileges either by being setuid
, or via Linux filesystem capabilities. For example, giving the helper only the CAP_DAC_OVERRIDE
capability would let it bypass file read, write, and execute checks. In Debian derivatives, the command-line tool to manipulate filesystem capabilities, setcap
, is in the libcap2-bin package.
If you cannot move the privileged part into a separate process, you can use the interface supported in Linux, BSDs, and HP-UX systems: setresuid(), which sets the real, effective, and saved user IDs in a single call. (There is a corresponding setresgid() call for the real, effective, and saved group IDs, but when using that one, remember that the supplementary group list is not modified; you need to explicitly call setgroups() or initgroups() to modify the supplementary group list.)
There are also filesystem user ID and filesystem group ID, but the C library will set these to match the effective ones whenever effective user and/or group ID is set.
If the process is started with superuser privileges, then the effective user ID will be zero. If you first use getresuid(&ruid, &euid, &suid)
and getresgid(&rgid, &egid, &sgid)
, you can use setresgid(rgid, rgid, rgid)
to ensure only the real group identity remains, and temporarily drop superuser privileges by calling setresuid(ruid, ruid, 0)
. To re-gain superuser privileges, use setresuid(0, ruid, 0)
, and to permanently drop superuser privileges, use setresuid(ruid, ruid, ruid)
.
This works, because a process is allowed to switch between real, effective, and saved identities. Effective is the one that governs access to resources.
There is a way to restrict the privilege to a dedicated thread within the process, but it is hacky and fragile, and I don't recommend it.
To keep the privilege restricted to within a single thread, you create custom wrappers around the SYS_setresuid
/SYS_setresuid32
, SYS_setresgid
/SYS_setresgid32
, SYS_getresuid
/SYS_getresuid32
, SYS_getresgid
/SYS_getresgid32
, SYS_setfsuid
/SYS_setfsuid32
, and SYS_setfsgid
/SYS_setfsgid32
syscalls. (Have the wrapper call the 32-bit version, and if it returns -ENOSYS, fall back to the 16-bit version.)
In Linux, the user and group identities are actually per-thread, not per-process. The standard C library used will use e.g. realtime POSIX signals and an internal handler to signal other threads to switch identity, as part of the library functions that manipulate these identities.
Early in your process, create a privileged thread, which will keep root (0) as the saved user identity, but otherwise copy real identity to effective and saved identities. For the main process, copy real identity to effective and saved identities. When the privileged thread needs to do something, it first sets the effective user identity to root, does the thing, then resets effective user identity to the real user identity. This way the privileged part is limited to this one thread, and is only applied for the sections when it is necessary, so that most common signal etc. exploits will not have a chance of working unless they occur just during such a privileged section.
The downside to this is that it is imperative that none of the identity-changing C library functions (setuid(), seteuid(), setgid(), setegid(), setfsuid(), setfsgid(), setreuid(), setregid(), setresuid(), setresgid()) must be used by any code within the process. Because in Linux C library functions are weak, you can ensure that by replacing them with your own versions: define those functions yourself, with the correct name (both as shown and with two underscores) and parameters.
Of all the various methods, I do believe the separate process with identity verification through an Unix domain socket pair, is the most sensible.
It is the easiest to make robust, and can be ported between POSIX and BSD systems at least.