23

Scenario

I'm trying to setup a simple docker image (I'm quite new to docker, so please correct my possible misconceptions) based on the public continuumio/anaconda3 container.

The Dockerfile:

FROM continuumio/anaconda3:latest

# update conda and setup environment
RUN conda update conda -y \
    && conda env list \
    && conda create -n testenv pip -y \
    && source activate testenv \
    && conda env list

Building and image from this by docker build -t test . ends with the error:

/bin/sh: 1: source: not found

when activating the new virtual environment.

Suggestion 1:

Following this answer I tried:

FROM continuumio/anaconda3:latest

# update conda and setup environment
RUN conda update conda -y \
    && conda env list \
    && conda create -y -n testenv pip \
    && /bin/bash -c "source activate testenv" \
    && conda env list

This seems to work at first, as it outputs: prepending /opt/conda/envs/testenv/bin to PATH, but conda env list as well ass echo $PATH clearly show that it doesn't:

[...]
# conda environments:
#
testenv                  /opt/conda/envs/testenv
root                  *  /opt/conda

---> 80a77e55a11f
Removing intermediate container 33982c006f94
Step 3 : RUN echo $PATH
---> Running in a30bb3706731
/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

The docker files work out of the box as a MWE. I appreciate any ideas. Thanks!

Community
  • 1
  • 1
ccauet
  • 363
  • 1
  • 2
  • 8
  • 2
    `bash -c "source activate whatever"` sources that into the new shell, but that's not what you need -- you want those variables to be added to your **existing** shell for them to do any good, or else the updates will be destroyed when the shell started with `bash -c` command exits, thus *before* you get to listing environment variables. – Charles Duffy Jun 21 '16 at 13:29
  • 2
    thus, you need it to be something more like `... && source testenv/bin/activate && conda env list`, if you want the new variables to be present for the `env list` -- though they still won't be present for any future RUN invocation, since each invocation is in a new shell, and no shell (or other UNIX process) can modify its parent process's environment variables. – Charles Duffy Jun 21 '16 at 13:32
  • Thanks @CharlesDuffy, you helped me a lot understanding the underlying problem. – ccauet Jun 21 '16 at 14:13
  • @ccauet can you update ur question explaining what the issue you have is? My docker can't find `bash -c` but when I get in the container itself and then activate the conda env inside the container things work fine. It would be nice to make explicit what the issue your having is. – Charlie Parker Oct 06 '17 at 18:56
  • seems `RUN /bin/bash -c "source activate pytorch-py35"` did work...not sure why `RUN /bin/bash -c source activate pytorch-py35` didn't work. – Charlie Parker Oct 06 '17 at 18:59

3 Answers3

6

Using the docker ENV instruction it is possible to add the virtual environment path persistently to PATH. Although this does not solve the selected environment listed under conda env list.

See the MWE:

FROM continuumio/anaconda3:latest

# update conda and setup environment
RUN conda update conda -y \
    && conda create -y -n testenv pip

ENV PATH /opt/conda/envs/testenv/bin:$PATH

RUN echo $PATH
RUN conda env list
ccauet
  • 363
  • 1
  • 2
  • 8
  • Besides the persistent problem regarding the selected conda environment, this solutions makes use of the hard coded path `/opt/conda/envs/testenv/bin`, which seems unwanted to me. For now I will use this solution, as the docker setup guarantees the path to be correct. – ccauet Jun 21 '16 at 14:33
  • 1
    Good answer. That said, there's more than just PATH to activate a virtualenv -- `activate` also changes `PYTHONHOME`, and sets a `VIRTUAL_ENV` environment variable (the former being the more important of the two, since it influences module loading). – Charles Duffy Jun 21 '16 at 15:28
  • 1
    Thanks. Are you sure about this `activate` behavior? I do not have experience with [virtualenv](https://pypi.python.org/pypi/virtualenv/) but maybe conda environments work differently? I just activated a conda virtual environment on my local machine and get empty strings for both `echo $PYTHONHOME` and `echo $VIRTUAL_ENV`. – ccauet Jun 21 '16 at 15:41
  • I'm assuming that conda uses the standard virtualenv tooling; if that assumption doesn't hold, then I may be wrong above. – Charles Duffy Jun 21 '16 at 15:42
  • Alright, thanks again. I will have an eye on this setup and watch out for better or more complete answers to my initial problem. – ccauet Jun 21 '16 at 16:28
  • your answer is confusing. Can you explain what the problem the OP had and how ur solutions solves it? – Charlie Parker Oct 06 '17 at 18:54
  • Found a good explanation on this problem here: https://pythonspeed.com/articles/activate-virtualenv-dockerfile/ – Henhuy Mar 22 '19 at 08:37
2

Method 1: use SHELL with a custom entrypoint script

EDIT: I have developed a new, improved approach which better than the "conda", "run" syntax.

Sample dockerfile available at this gist. It works by leveraging a custom entrypoint script to set up the environment before execing the arguments of the RUN stanza.

Why does this work?

A shell is (put very simply) a process which can act as an entrypoint for arbitrary programs. exec "$@" allows us to launch a new process, inheriting all of the environment of the parent process. In this case, this means we activate conda (which basically mangles a bunch of environment variables), then run /bin/bash -c CONTENTS_OF_DOCKER_RUN.


Method 2: SHELL with arguments

Here is my previous approach, courtesy of Itamar Turner-Trauring; many thanks to them!

# Create the environment:
COPY environment.yml .
RUN conda env create -f environment.yml

# Set the default docker build shell to run as the conda wrapped process
SHELL ["conda", "run", "-n", "vigilant_detect", "/bin/bash", "-c"]

# Set your entrypoint to use the conda environment as well
ENTRYPOINT ["conda", "run", "-n", "myenv", "python", "run.py"]

Modifying ENV may not be the best approach since conda likes to take control of environment variables itself. Additionally, your custom conda env may activate other scripts to further modulate the environment.

Why does this work?

This leverages conda run to "add entries to PATH for the environment and run any activation scripts that the environment may contain" before starting the new bash shell.

Using conda can be a frustrating experience, since both tools effectively want to monopolize the environment, and theoretically, you shouldn't ever need conda inside a container. But deadlines and technical debt being a thing, sometimes you just gotta get it done, and sometimes conda is the easiest way to provision dependencies (looking at you, GDAL).

DeusXMachina
  • 1,239
  • 1
  • 18
  • 26
  • 1
    Due to problems with `conda run`, I came to the same conclusion that one really needs such an entrypoint script for a proper solution. However, glancing at your answer, the humongous "Why does this work?" text drew all my attention away from your improved answer. Could you perhaps put "new, improved approach" in the huge letters instead of the old `conda run` approach? – Ben Mares Sep 15 '20 at 13:09
  • Thanks for the tip. I am also working on cleaning up the gist a bit. Let me know if you hit any issues! – DeusXMachina Sep 15 '20 at 20:56
  • For me it worked only by removing the `conda install -n base pip` and `conda init` commands, the first because it says it can't find the base environment, the second because my version of miniconda3 does't have the `init` command. However, now every time I execute a RUN command there is a stderr error telling me `CommandNotFoundError: activate is not a conda command`, even if the command then works well – Tareyes Apr 03 '21 at 09:08
1

Piggybacking on ccauet's answer (which I couldn't get to work), and Charles Duffey's comment about there being more to it than just PATH, here's what will take care of the issue.

When activating an environment, conda sets the following variables, as well as a few that backup default values that can be referenced when deactivating the environment. These variables have been omitted from the Dockerfile, as the root conda environment need never be used again. For reference, these are CONDA_PATH_BACKUP, CONDA_PS1_BACKUP, and _CONDA_SET_PROJ_LIB. It also sets PS1 in order to show (testenv) at the left of the terminal prompt line, which was also omitted. The following statements will do what you want.

ENV PATH /opt/conda/envs/testenv/bin:$PATH
ENV CONDA_DEFAULT_ENV testenv
ENV CONDA_PREFIX /opt/conda/envs/testenv

In order to shrink the number of layers created, you can combine these commands into a single ENV command setting all the variables at once as well.

There may be some other variables that need to be set, based on the package. For example,

ENV GDAL_DATA /opt/conda/envs/testenv/share/gdal
ENV CPL_ZIP_ENCODING UTF-8
ENV PROJ_LIB /opt/conda/envs/testenv/share/proj

The easy way to get this information is to call printenv > root_env.txt in the root environment, activate testenv, then call printenv > test_env.txt, and examine diff root_env.txt test_env.txt.