9

Docker supports user namespace remapping, so that the user namespace is completely separated from the host.

The current default behavior ensures that containers get their own user and group management, i.e. their own version of /etc/passwd and /etc/group, but container processes are run under the same identical UIDs on the host system. This means if your container runs with UID 1 (root), it will also run as root on the host. By the same token, if your container has user "john" with UID 1001 installed and starts its main process with that user, on the host it will also run with UID 1001, which might belong to user "Will" and could also have admin rights.

To make user namespace isolation complete, one needs to enable remapping, which maps the UIDs in the container to different UIDs on the host. So, UID 1 on the container would be mapped to a "non-privileged" UID on the host.

Is there any support in Kubernetes for this feature to be enabled on the underlying Container Runtime? Will it work out of the box without issues?

Fritz Duchardt
  • 11,026
  • 4
  • 41
  • 60
Ijaz Ahmad
  • 11,198
  • 9
  • 53
  • 73
  • What's a use case where you might need this? (On the Kubernetes clusters I've used I've never had access to the host's filesystem and had no access to host uids in any form.) – David Maze Oct 27 '18 at 18:39
  • if there is not usernamespaces then user can run pods/container as uid 0 , root , which is the same as root on host , and can open all the possibilties of doing demage , without uer ns enabled on docker engine , root in container is root on host – Ijaz Ahmad Oct 27 '18 at 18:57
  • https://github.com/kubernetes/enhancements/issues/127 – johnharris85 Oct 27 '18 at 18:59
  • looks like work in progress https://github.com/kubernetes/enhancements/issues/127 – Shane Warne Oct 27 '18 at 18:59
  • @johnharris85 seems to get available in 1.14 or 1.15 :) – Ijaz Ahmad Oct 27 '18 at 19:02
  • @johnharris85 by the way , we are using the docker EE kubernetes , are they doing something about it – Ijaz Ahmad Oct 27 '18 at 19:05

1 Answers1

7

So, it's not supported yet like Docker as per this (as alluded in the comments) and this.

However, if you are looking at isolating your workloads there are other alternatives (it's not the same, but the options are pretty good):

You can use Pod Security Policies and specifically you can use RunAsUser, together with AllowPrivilegeEscalation=false. Pod Security Policies can be tied to RBAC so you can restrict how users run their pods.

In other words, you can force your users to run pods only as 'youruser' and disable the privileged flag in the pod securityContext. You can also disable sudo and in your container images.

Furthermore, you can drop Linux Capabilities, specifically CAP_SETUID. And even more advanced use a seccomp profile, use SElinux or an Apparmor profile.

Other alternatives to run untrusted workloads (in alpha as of this writing):

Rico
  • 58,485
  • 12
  • 111
  • 141
  • As a cluster admin, I want to protect the node from the rogue container process(es) running inside pod containers with root privileges. If such a process is able to break out into the node, it could be a security issue. As a cluster admin, I want to support all the images irrespective of what user/group that image is using. As a cluster admin, I want to allow some pods to disable user namespaces if they require elevated privileges. – Ijaz Ahmad Oct 28 '18 at 08:13
  • A well crafted seccomp profile with the others (caps, selinux, apparmor) would do it, but it's hard (very advanced). Posted other alternatives in alpha, but not quite prod ready. Docker EE would do it with docker swarm (not k8s). If you don't want to deal with it, wait for user namespaces in k8s or other alpha projects to mature. Contributions are welcome :-) – Rico Oct 28 '18 at 16:57