Privileged containers and capabilities

Question

If I am running a container in privileged mode, does it have all the Kernel capabilities or do I need to add them separately?

score 93 · Accepted Answer · edited Feb 07 '22 at 20:08

93

Running in privileged mode indeed gives the container all capabilities. But it is good practice to always give a container the minimum requirements it needs.

The Docker run command documentation refers to this flag:

Full container capabilities (--privileged)

The --privileged flag gives all capabilities to the container, and it also lifts all the limitations enforced by the device cgroup controller. In other words, the container can then do almost everything that the host can do. This flag exists to allow special use-cases, like running Docker within Docker.

You can give specific capabilities using --cap-add flag. See man 7 capabilities for more info on those capabilities. The literal names can be used, e.g. --cap-add CAP_FOWNER.

edited Feb 07 '22 at 20:08

Iain Samuel McLean Elder

19,791
12
64
80

answered Apr 06 '16 at 04:36

buddy123

5,679
10
47
73

is there any way to find out what capabilities does a particular application need? This seems to be undocumented for most applications. – codefx Apr 06 '16 at 17:19
@codefx There is no rule of thumb here, It depends on the application and what system calls it may call during runtime. If you're using a ready-made docker image from the hub, It will most probably be mentioned there. If you use something you write yourself, You should know what kernel API's you used that may require special capabilities – buddy123 Apr 06 '16 at 19:51
@buddy123 do you have a reference to the docs for that? Your version makes a lot more sense than what I currently find in [Docker run reference](https://docs.docker.com/engine/reference/run/): "When the operator executes docker run --privileged, Docker will enable access to all devices on the host as well as set some configuration in AppArmor or SELinux to allow the container nearly all the same access to the host as processes running outside containers on the host." It says something vague about "extended privileges" without mentioning capabilities at all. – Iain Samuel McLean Elder Feb 06 '22 at 14:34
@IainSamuelMcLeanElder Well, my comment is almost 7 years old. The quote itself was taken as-is from docker docs back then, but it changed so many times during that time... :-) – buddy123 Feb 07 '22 at 16:34
1

There is actually a different, overlapping page for the run **command** (not the **reference**). And your quote still appears today: https://github.com/docker/cli/blob/c780f7c4abaf67034ecfaa0611e03695cf9e4a3e/docs/reference/commandline/run.md I'll edit your answer to link to to the page. – Iain Samuel McLean Elder Feb 07 '22 at 20:05

score 66 · Answer 2 · edited Jun 27 '19 at 18:35

You never want to run a container using --privileged.

I am doing this on my laptop which has NVMe drives, but it will work for any host:

docker run --privileged -t -i --rm ubuntu:latest bash

First lets do something minor, to test the /proc file system

From the container:

root@507aeb767c7e:/# cat /proc/sys/vm/swappiness
60
root@507aeb767c7e:/# echo "61" > /proc/sys/vm/swappiness    
root@507aeb767c7e:/# cat /proc/sys/vm/swappiness
60

OK, did it change it for the container or for the host?

$ cat /proc/sys/vm/swappiness
61

OOPS! We can arbitrarily change the hosts kernel parameters. But this is just a DOS situation, lets see if we can collect privileged information from the parent host.

Lets walk the /sys tree and find the major minor number for the boot disk.

Note: I have two NVMe drives and containers are running under LVM on another drive

root@507aeb767c7e:/proc# cat /sys/block/nvme1n1/dev
259:2

OK, let's make a device file in a location where the dbus rules won't auto scan:

root@507aeb767c7e:/proc# mknod /devnvme1n1 b 259 2
root@507aeb767c7e:/proc# sfdisk -d /devnvme1n1 
label: gpt
label-id: 1BE1DF1D-3523-4F22-B22A-29FEF19F019E
device: /devnvme1n1
unit: sectors
first-lba: 34
last-lba: 2000409230
<SNIP>

OK, we can read the bootdisk, lets make a device file for one of the partitions. While we can't mount it as it will be open we can still use dd to copy it.

root@507aeb767c7e:/proc# mknod /devnvme1n1p1 b 259 3
root@507aeb767c7e:/# dd if=devnvme1n1p1 of=foo.img
532480+0 records in
532480+0 records out
272629760 bytes (273 MB, 260 MiB) copied, 0.74277 s, 367 MB/s

OK, lets mount it and see if our efforts worked!!!

root@507aeb767c7e:/# mount -o loop foo.img /foo
root@507aeb767c7e:/# ls foo
EFI
root@507aeb767c7e:/# ls foo/EFI/
Boot  Microsoft  ubuntu

So basically any container host that you allow anyone to launch a --privileged container on is the same as giving them root access to every container on that host.

Unfortunately the Docker project has chosen the trusted computing model, and outside of auth plugins there is no way to protect against this, so always err on the side of adding needed features vs. using --privileged.

score 5 · Answer 3 · edited Jun 27 '19 at 18:33

5

There is a good article from RedHat covering this.

While docker container running as "root" has less privileges than root on host, it still may need hardening depending on your use case (using as your development environment vs shared production cluster).

edited Jun 27 '19 at 18:33

tshepang

12,111
21
91
136

answered Apr 05 '16 at 17:20

brooding_goat

175
1
4
15

score 1 · Answer 4 · answered Sep 20 '22 at 14:46

Instead of changing the swappiness, you can just write the same value to it, and check what you get back:

Unprivileged docker:

root@8191892d9f7f:/# cat /proc/sys/vm/swappiness
20
root@8191892d9f7f:/# echo 20 >  /proc/sys/vm/swappiness
bash: /proc/sys/vm/swappiness: Read-only file system

Privileged docker:

root@7c6c0a793ca0:/# cat /proc/sys/vm/swappiness
20
root@7c6c0a793ca0:/# echo 20 >  /proc/sys/vm/swappiness
root@7c6c0a793ca0:/#

Privileged containers and capabilities

4 Answers4

Linked