Why is unprivileged recursive unshare(CLONE_NEWUSER) not permitted?

Question

I'm on Ubuntu 17.04.

Single unprivilleged unshare of mount namespace works. You can try using unshare(1) command:

$ unshare -m -U /bin/sh
#

However unshare within unshare is not permitted:

$ unshare -m -U /bin/sh
# unshare -m -U /bin/sh
unshare: Operation not permitted
#

Here is a C program that will basically do the same:

#define _GNU_SOURCE
#include <stdio.h>
#include <sched.h>
#include <sys/mount.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
    if(unshare(CLONE_NEWUSER|CLONE_NEWNS) == -1) {
        perror("unshare");
        return -1;
    }
    if(unshare(CLONE_NEWUSER|CLONE_NEWNS) == -1) {
        perror("unshare2");
        return -1;
    }
    return 0;
}

Why it's not permitted? Where I can find documentation about this? I failed to find this information in unshare or clone man page and in kernel unshare documentation.

Is there a system setting that would allow this?

What I want to achieve:

First unshare: I want to mask few binaries on system with my own versions.

Second unshare: unprivilleged chroot.

score 4 · Accepted Answer · answered Sep 11 '17 at 11:34

4

I'm somewhat guessing here, but I think that the reason is the UID mapping. In order to perform it, certain conditions must be met (from the user_namespaces man page):

   In  order  for  a process to write to the /proc/[pid]/uid_map (/proc/[pid]/gid_map) file, all of the following require‐
   ments must be met:

   1. The writing process must have the CAP_SETUID (CAP_SETGID) capability in the user namespace of the process pid.

   2. The writing process must either be in the user namespace of the process pid or be in the parent  user  namespace  of
      the process pid.

   3. The mapped user IDs (group IDs) must in turn have a mapping in the parent user namespace.

I believe what happens is that the first time you run, the mapping matches that of the parent UID. The second time, however, it does not, and this fails the system call.

From the unshare(2) manual page:

   EPERM  CLONE_NEWUSER was specified in flags, but either the effective user ID or the effective group ID of  the  caller
          does not have a mapping in the parent namespace (see user_namespaces(7)).

answered Sep 11 '17 at 11:34

Shachar Shemesh

8,193
6
25
57

Would that make unprivilleged recursive unshare generally impossible? First is allowed, because of CLONE_NEWUSER. But then you can't do the second as you wrote. – Hadrian Węgrzynowski Sep 11 '17 at 11:53
I will check if I can do the same thing that nsenter(1) does. I call unshare in two different processes that spawn workers I think that maybe I can get around it. Maybe I could first go back to parent namespace and then unshare again, because the masking I do after first unshare is no longer necessary for chroot. – Hadrian Węgrzynowski Sep 11 '17 at 11:58
It is not possible probably. As setns(2) man page says: _A process reassociating itself with a user namespace must have the CAP_SYS_ADMIN capability in the target user namespace._ – Hadrian Węgrzynowski Sep 11 '17 at 12:01
I believe it is possible. You just need to make sure you create compatible mapping both times. There is no problem with CAP_SYS_ADMIN, because you get the full capabilities set immediately after unshare(CLONE_NEWUSER). – Shachar Shemesh Sep 11 '17 at 12:37
1

Thank you. I managed to do it by putting `fprintf(/proc/self/uid_map, "%d %d 1\n", uid, uid); fputs("deny", /proc/self/setgroups); fprintf(/proc/self/gid_map, "%d %d 1\n", gid, gid);` after the first unshare and getting uid and gid before it. – Hadrian Węgrzynowski Sep 11 '17 at 21:43

Why is unprivileged recursive unshare(CLONE_NEWUSER) not permitted?

1 Answers1