Setting root-only permissions on /dev files and build binary

Question

As part of a build process, I want to run the following two commands:

sudo chmod a+r /dev/cpu/*/msr
sudo setcap cap_sys_rawio=ep ./bench

This sets the /dev/cpu/*/msr files exposed by the msr kernel module to world-readable, and sets additional permissions on the ./bench binary (produced as part of the build) that it needs to actually read those files.

The problem is this requires root permissions, hence the sudo.

I'd like something like a setuid root script that does these two specific things, but setuid root scripts are not recommended and disabled on modern Linux.

What are my options here for a straightforward solution?

A solution which works only for the second line (the setcap) is also interesting, because I need this one to run every build, while the chmod only needs to run once per boot.

Tinkerer · Answer 1 · 2021-04-10T22:35:28.757

Achieving the same thing via libcap is actually not that much code:

#include <stdio.h>
#include <sys/capability.h>

int main(int arc, char *argv[]) {
    cap_t c = cap_from_text("cap_sys_rawio=ep");
    int status = cap_set_file("./bench", c);
    cap_free(c);
    if (status)
        perror("attempt failed");
    return status != 0;
}

To compile this (on debian, you'll need to sudo apt-get install libcap-dev; on fedora, sudo dnf install libcap-devel):

$ gcc -o mkcap mkcap.c -lcap

If you just run it as is, it will fail since the program needs to have sufficient privilege to actually add the capability to ./bench:

$ ./mkcap
attempt failed: Operation not permitted

So, you need to make it sufficiently capable itself:

$ sudo /sbin/setcap cap_setfcap=ep ./mkcap
$ ./mkcap
$ echo $?
0

You might want to consider being more explicit with the path to the "./bench" binary since, depending on your environment, you might worry that someone could abuse mkcap to give cap_sys_rawio to some other program. Using a full pathname would be less ambiguous.
You could also chmod go-x ./mkcap to limit who can run it.
You could also consider using inheritable capabilities for all this:

basic $ sudo setcap cap_setfcap=ei ./mkcap
basic $ ./mkcap
attempt failed: Operation not permitted
basic $ sudo capsh --inh=cap_setfcap --user=$(whoami) --
enhanced $ ./mkcap
enhanced $ echo $?
0

In the enhanced (capsh shell) layer you are able to raise that capability on binaries that have their file inheritable bit set. This way, the default basic layer shells can't get any privilege out of mkcap. In all other ways, the enhanced shell layer is identical to a basic layer. For example, you can execute builds and pretty much do things as normal. (Use exit to leave the enhanced shell.)

There is a pam_cap module that can also add an inheritable bit to all shells of specific users at login etc time.

`cap_free` can set errno; perhaps better to check `status` and run `perror` right after `cap_set_file`. — Peter Cordes, Apr 10 '21 at 18:20
In this specific case I think it is actually safe to not do this. If `cap_free()` is operating on a valid pointer, then it won't set `errno`. If it is operating something else, because the `cap_from_text()` function failed, then the printed 'perror()' will be more informative. — Tinkerer, Apr 10 '21 at 22:27

Joseph Sible-Reinstate Monica · Answer 2 · 2020-01-13T22:40:41.480

You can build a simple C program to use in place of the shell script:

#include <stdio.h>
#include <unistd.h>

int main(void) {
    char *const envp[] = {NULL};
    execle("/sbin/setcap", "setcap", "cap_sys_rawio=ep", "./bench", NULL, envp);
    perror("execle");
    return 1;
}

Notes:

That's secure in that it ignores its environment (including PATH) and doesn't call the shell, but it can still be run from anywhere, so there's no guarantee of exactly what ./bench is. You may want to hardcode the absolute path.
You can use the same trick to run multiple commands, but then you have to get into fork and wait. (Don't use system, as this invokes a shell and defeats the purpose of disallowing setuid scripts!)
Instead of calling the setcap binary, you could use the libcap functions instead, but that would be a bit more complicated.
You can use glob to expand /dev/cpu/*/msr like the shell does if you want to do the first part, and then stat and chown to avoid having to exec.

Thanks, the trick I was missing was an easy way to run shell commands from C. I knew you could write a C program, but I didn't want to painstakingly convert the shell script into system calls (which sometimes is very complicated, when the underlying process does a lot of work). — BeeOnRope, Jan 13 '20 at 22:39

score 1 · Answer 3 · answered Jul 09 '21 at 08:45

As already suggested, using libcap is a good option how to set capabilities. To access MSRs without sudo rights I suggest to use msr-safe. It has been implemented for Intel, however I have successfully tested it for AMD too. After loading a msr_safe kernel module, there will be a few new files. Instead of accessing msrs via /dev/cpu/*/msr you will use /dev/cpu/*/msr_safe. It will expose you registers listed in /dev/cpu/msr_allowlist (previously msr_whitelist). You may also control which bits of these registers will be writable, by a mask in the file.

Moreover it is possible to prepare a batch (/dev/cpu/msr_batch) - list of registers you will access repeatedly, so it will reduce overhead. This feature is implemented in libmsr.

Setting root-only permissions on /dev files and build binary

3 Answers3