It's a mixture of design, convenience and technical necessity.
The biggest reason is that, unless you use certain params that say otherwise, Singularity images are read-only filesystems. You need somewhere to write output and any temporary files that get created along the way. Maybe you know to mount in your output dir, but there are all sorts of files that get created / modified / deleted in the background that we don't ever think about. Implicit automounts give reasonable defaults that work in most situations.
Simplistic example: you're doing a big sort
and filter operation on some data, but you're print the results to console so you don't bother to mount in anything but the raw data. But even after some manipulation and filtering, the size of the data exceeds available memory so sort falls back to using small files in /tmp
before being deleted when the process finishes. And then it crashes because you can't write to /tmp
.
You can require a user to manually specify a what to mount to /tmp
on run, or you can use a sane default like /tmp
and also allow that to be overridden by the user (SINGULARITY_TMPDIR
, -B $PWD/fake_tmp:/tmp
, --contain/--containall
). These are all also configurable, so the admins can set sane defaults specific the running environment.
There are also technical reasons for some of the mounts. e.g., /etc/passwd
and /etc/group
are needed to match permissions on the host OS. The docs on bind paths and mounts are actually pretty good and have more specifics on the whats and whys, and even the answer to your third question: --no-mount
. The --contain/--containall
flags will probably also be of interest. If you really want to deep dive, there are also the admin docs and the source code on github.
A simple but real singularity use case, with explanation:
singularity exec \
--cleanenv \
-H $PWD:/home \
-B /some/local/data:/data \
multiqc.sif \
multiqc -i $SAMPLE_ID /data
--cleanenv / -e
: You've already experienced the fun of unexpected mounts, there's also unexpected environment variables! --cleanenv/-e
tells Singularity to not persist the host execution environment in the container. You can still use, e.g., SINGULARITYENV_SOMEVAR=23
to have SOMEVAR=23
inside the container though, as that is explicitly set.
-H $PWD:/home
: This mounts the current directory into the container to /home
and sets HOME=/home
. While using --contain/--containall
and explicit mounts is probably a better solution, I am lazy and this ensures several things:
the current directory is mounted into the container. The implicit mounting of the working is allowed to fail, and will do so quietly, if the base directory does not exist in the image. e.g., if you're running from /cluster/my-lab/some-project
and there is no /cluster
inside your image, it will not be mounted in. This is not an issue if using explicit binds directly (-B /cluster/my-lab/some-project
) or if an explicit bind has a shared path (-B /cluster/data/experiment-123
) with current directory.
the command is executed from the context of the current directory. If $PWD
fails to be mounted as described above, singularity uses $HOME
as the working directory instead. If both $PWD
and $HOME
failed to mount, /
is used. This can cause problems if you're using relative paths and you aren't where you expected to be. Since it is specific to the path on the host, it can be really annoying when trying to duplicate a problem locally.
the base path is inside the container is always the same regardless of host OS file structure. Consistency is good.
The rest is just the command that's being run, which in this case summarizes the logs from other programs that work with genetic data.