When running a docker container with host mounted volumes, both from docker
and docker-compose
, on RHEL, I observe a large amount of disk I/O (using dstat
) before the container is launched.
The I/O is associated with the dockerd
process, and I am able to clearly increase or reduce the I/O by mounting or removing host volumes.
If I do not mount any host volumes, the container launches immediately. If I mount volumes that cover a large part of the file system, the I/O is significant, in my case about 20 Gb that takes about three minutes before the container launches. In some cases, this causes docker-compose up
orchestration to simply time out.
A typical run command looks like this
docker run -it --rm --name my_container \
-v /host/app/src:/app:ro,z \ # host volume defined here
-v my_ro_data:/data/read_only/files:ro,z \ # external named volume
-v /host/data/write:/data/container_output/files:z \ # another host volume
my/image:latest
The I/O occurs regardless if volume is a pre-defined named volume, and regardless if syntax is used to mark it read-only. But when defining an external named volume, it looks like this:
docker volume create \
--driver local \
--opt type=none \
--opt o=bind \
--opt device=/host/data/files \
my_ro_data
I assume the I/O is related to the overlay file system, but I cannot find any clear explanation of what exactly is being written, where it is being written, and how to perhaps optimize a configuration to require less I/O before container launch. It is clearly not the contents of the entire volume, so it would seem to be some sort of differential? However, imagine I have some sort of large scale data pipeline and I want to point my container at host source or target directories with terabytes of files...How can I mount host volumes with less impact to container startup latency?
Update:
Based on guidance from @BMitch, I focused on the SELinux related ":z"
label.
Brief history:
Originally (about a year prior to the post) the mounted volumes were not accessible to the docker container on our RHEL w/SELinux server without this label. Even though --volumes-from
is a different cli option, it had the best explanation that other sources were referring to when solving the access issues:
Labeling systems like SELinux require that proper labels are placed on volume content mounted into a container. Without a label, the security system might prevent the processes running inside the container from using the content. By default, Docker does not change the labels set by the OS. To change the label in the container context, you can add either of two suffixes :z or :Z to the volume mount. These suffixes tell Docker to relabel file objects on the shared volumes. The z option tells Docker that two containers share the volume content. As a result, Docker labels the content with a shared content label. Shared volume labels allow all containers to read/write content. The Z option tells Docker to label the content with a private unshared label.
This explanation is accompanied by a warning: elswhere:
Bind-mounting a system directory such as /home or /usr with the Z option renders your host machine inoperable and you may need to relabel the host machine files by hand.
So I used ":z"
and sometimes ":ro,z"
and everything worked fine.
It turns out, this label is causing the pre-launch disk I/O. I do not understand SELinux security and labels well, but I imagine the I/O is actual altering of file labels when the volume is mounted, and so the more files, the longer the disk I/O.
My observation is that removing the labels, and doing nothing else, results in the same behavior. Meaning the default behavior now from the docker engine is to treat SELinux mounted volumes as if labeled ":z"
. I believe this is a new behavior that may have been introduced over the past year...or some other system change...because now the volumes are accessible without the label (or maybe the labels are permanent, allowing subsequent docker access).
However, removing the :z
does not solve the I/O and long startup time. I then found this github conversation which claims both :z and :Z are potentially dangerous choices and a comment:
If your container does require broader access to system directories, then use of '--security-opt label:disable' with the 'docker run' command is a better alternative. Note that using the above option instead will disable SELinux checks for that container.
So I added this option and, in fact, the volumes were accessible and there was zero (or minimal) disk I/O and startup latency.
That said, I truly do not understand the repercussions of --security-opt label:disable
and would welcome any additional advice or explanation.