10

Background

I want to avoid ever "accidentally" working in a default environment.

I want to always have an equivalent to a requirements.txt or package.json file available, both to clearly separate one project from another, and so that I can easily look back to see what is installed (and what version of it).


But I work primarily in the data science / analytics world, and primarily with Python.

As such, I use Anaconda, pip, and Homebrew (I have a Mac). It would be great to rely upon just one package manager, and many folks espouse one method or another to accomplish this. Truth is, as of now (Sep 2018), it's impossible to work in any breadth of topics and avoid at least some mixture.


Setting my sights lower and more realistic, I simply want to make sure that there is no default environment wherever possible, to make it cleaner and easier to work on projects with others.

To my knowledge, there is no concept of an environment in Homebrew at all. Conda of course has environments, but it first sets up a default environment before you can create any others.

Question

Is there any way to install Anaconda without any default environment, so that I will always have to source activate <my_env>? If so, how do I do that?

Barring this, what are the best suggestions to accomplish what I want, which is to never accidentally work in an environment where it is unclear what my dependencies are, recognizing that I'm talking primarily but not exclusively about using Python?

(Please don't suggest that I should "just be careful" when installing packages. Yes, I understand that. But I am trying to pre-emptively be careful by making the wrong choices as difficult or impossible as I can. If I had no default environment, for instance, then pip would not even work until I sourced an environment since it would not be found in my normal environment.)

Community
  • 1
  • 1
Mike Williamson
  • 4,915
  • 14
  • 67
  • 104
  • I am not a data scientist so I can't vouch for this specific use case, but what if you use Docker containers for your environments ? Each of your project have a `Dockerfile` with anaconda, and you install only requirements.txt in to the container. Yo run commands with `docker run` so you can`t accidentally run a command in default – atakanyenel Sep 24 '18 at 19:14
  • Hi @atayenel Thanks! That is, of course, a viable option. But it comes with a **lot** of extra work. Containers are great, but I don't want to have to put so much extra effort into all of the work I do - from short tasks to ongoing major projects. – Mike Williamson Sep 24 '18 at 19:16
  • 1
    That is understandable, I leave it to a data scientist to come up with a better solution. – atakanyenel Sep 24 '18 at 19:17
  • Just remove the default environment from your path? Well, that would leave the system default. But on Mac that should be Python 2.7, probably, which won't come with PIP. EDIT: well, it will, so you could remove `/usr/local/bin/` from your PATH as well – juanpa.arrivillaga Oct 03 '18 at 22:01
  • 2
    you could create a python virtual environment, install anaconda, and source activate anaconda in your virtual environment. The virtual environment would then only use the python version in which you set up the env, and would have absolutely 0 dependencies that you haven't installed. You could then occasionally `pip freeze > requirements.txt` if you want to keep track of things. You could then do `pip install -r requirements.txt` at any point in the future if you want to restart where you left off. I can post a more thorough walk through if needed. – d_kennetz Oct 03 '18 at 22:05
  • All those (anaconda, pip & homebrew) are 3 different packages with their own scopes. `anaconda` (or for more granular control, `miniconda`, which i'm more familiar & prefer) bundles python & packages that run on python, in your home directory `/Users/` scope (unless you pick system wide installation for anaconda installer). `brew` however, operates on system root scope since it can be used to install softwares for the system. `pip` can be seen as a python extension, it depends on whichever python version/implementation it came with, there's no default scope other than that. – deadvoid Oct 04 '18 at 20:52
  • @cryptonome Yes, I understand they are 3 different packages with different scopes. And, as you point out, they work at different levels, with brew forcing (I believe) root-level installs. But, the workflow I outlined is fairly standard for data scientists "just trying to get stuff done" who also occasionally work on larger team projects. So, there must be some sort of solution, or at least a BKM to avoid the mess that it seems we're all otherwise stuck in. – Mike Williamson Oct 05 '18 at 21:25
  • @d_kennetz You might have the best solution, although it seems like even you are aware that this is not ideal. But it does solve all of what I want, albeit not easily and still allows for potential mistakes. This seems like a problem that *many* people must run across: using interpreted languages in larger scoped projects and ensuring your local computer environment does not "influence" how that project runs. If everything I did only required Python, then virtualenv would solve it. Alas, even Python itself requires more than just pure Python, due to Cython, etc. – Mike Williamson Oct 05 '18 at 21:30
  • what i'm pointed out was that you basically should have no problem with brew since they & you can't do anything about the system-wide scope it needs to provide to softwares that needs it, and that leaves python/pip vs conda. besides, brew handles this problem by containing the softwares to a single folder `/usr/local/Cellar/` instead of letting softwares installed in their usual default paths. It would be easier for your problem to solve if you exclude brew given that difference, and localizing pip vs conda is more apple to apple. – deadvoid Oct 05 '18 at 23:18
  • 1
    And why i think this is a more reasonable approach is because in the end, it can also be argued that _all_ of desktop softwares scope is in your computer system, regardless of the path. Either too wide or too general won't help this case, hence to make it more about virtual environment containment like what you also particularly mentioned in your question, is a lot more doable than to include system-wide requirement imposed on brew. Until brew can install anaconda/miniconda, that won't be any easier. I doubt that would ever happen since anaconda is also commercial endeavour. – deadvoid Oct 05 '18 at 23:25
  • 2
    TL;DR: i know you probably use Matlab or R or other languages along with python, matplotlib, pandas, numpy etc, but trying to solve all brew, python/pip & anaconda seems to me isn't realistically doable with _simple solutions_ given the scope difference & reasons i laid out. One thing that comes across my mind to solve complex problem like this is VM/virtual containers, but it wouldn't be simple both in setting up nor day-to-day practice. I personally doubt what @d_kennetz proposed will solve this, since anaconda comes with its own python installation (_and_ pip) & own package manager (conda). – deadvoid Oct 05 '18 at 23:44
  • I agree that my solution isn’t some endall be all fix for his problem. I just considered it to be a reasonable solution to his question. I think the question itself is a bit less reasonable and outside the scope of modern technology. I think within a given field, some things become inherent and if a person decides to enter into the field they should learn the software involved. – d_kennetz Oct 06 '18 at 03:11
  • To continue, if you’d like to make your area of study thoughtless for consumers who wish to perform analyses, then create a docker (which I know you mentioned you did not wish to do). But my advice is to either teach the people working with you in depth, or do it yourself because it isn’t reasonable to perform your request. – d_kennetz Oct 06 '18 at 03:12
  • 1
    But lastly, you actually can source activate a VM and then source activate anaconda in the VM as I have done this, and the version of Python will be the anaconda version inside the VM. – d_kennetz Oct 06 '18 at 03:16
  • btw, take a look at https://stackoverflow.com/questions/42859781/best-practices-with-anaconda-and-brew, also https://stackoverflow.com/questions/37677476/workflow-for-python-with-docker-ide-for-non-web-applications. @d_kennetz just to make sure by VM you mean virtualenv, venv, pipenv etc or docker, vmware, et al? the first group _needs_ python, and as i mentioned, anaconda comes with its own python. – deadvoid Oct 06 '18 at 18:28
  • I'm not a mac user but don't you have to activate the base/default environment explicitly in the shell/command line every time you need conda? Either way, to repair an environment you can always roll back to a previous revision by activating the environment and doing the following: `conda list --revisions` `conda install --revision ` – ayorgo Oct 09 '18 at 15:07

3 Answers3

2

I think your best bet is to simply use a virtual environment and install dependencies as they become necessary, then just check in and out of your virtual environment as your work progresses. You can make different virtual environments as you work on different projects and leave their corresponding requirements.txt files inside of the directory python creates when installing a virtual environment. Let's say I have python3.5.2 as my normal, go-to python package (because I do).

Using python3.5 let us enter into a virtual environment with nothing more than bare bones python3.5 (no installed dependencies). To do this:

[dkennetz@node venv_test]$ python -m venv my_SO_project
[dkennetz@node venv_test]$ ls
my_SO_project

so we see, python has created a directory to house my virtual environment, but my virtual environment is not being used as my default python. In order to do this, we must activate it:

[dkennetz@node venv_test]$ source ./my_SO_project/bin/activate

So my shell now looks like this:

(my_SO_project) [dkennetz@nodecn201  venv_test]$

While we are here, let's see what our requirements look like:

(my_SO_project) [dkennetz@nodecn201  venv_test]$ pip freeze > requirements.txt
(my_SO_project) [dkennetz@nodecn201  venv_test]$ ls -alh
drwxr-x---  3 dkennetz blank 4.0K Oct  9 09:52 .
drwxr-x--- 93 dkennetz root      16K Oct  9 09:40 ..
drwxr-x---  5 dkennetz blank 4.0K Oct  9 09:47 my_SO_project
-rwxr-x---  1 dkennetz blank    0 Oct  9 09:47 requirements.txt

Using blank to hide group names, but as we can see, our requirements.txt file size is empty, meaning this virtual environment has no dependencies. It is purely python3.5. Now let's go ahead and install pandas and see how our dependencies change.

(my_SO_project) [dkennetz@nodecn201  venv_test]$ pip install pandas
(my_SO_project) [dkennetz@nodecn201  venv_test]$ pip freeze > requirements.txt
(my_SO_project) [dkennetz@nodecn201  venv_test]$ more requirements.txt
numpy==1.15.2
pandas==0.23.4
python-dateutil==2.7.3
pytz==2018.5
six==1.11.0
(my_SO_project) [dkennetz@nodecn201  venv_test]$ wc -l requirements.txt
5 requirements.txt

Let's say we have written some code inside the environment and we no longer want to do any more work, so we do one final pip freeze > requirements.txt and we leave:

(my_SO_project) [dkennetz@nodecn201  venv_test]$ deactivate
[dkennetz@nodecn201  venv_test]$ pip freeze > requirements_normal.txt
[dkennetz@nodecn201  venv_test]$ wc -l requirements_normal.txt
82 requirements_normal.txt

Much more dependencies popped up, but nothing has changed in our normal environment, and nothing has changed in our virtual environment. Now let's say we have taken the rest of the day off and wish to go back to our SO_project that we created yesterday. Well it is easy:

[dkennetz@nodecn201  venv_test]$ ls -alh
drwxr-x---  3 dkennetz blank 4.0K Oct  9 10:01 .
drwxr-x--- 93 dkennetz root      16K Oct  9 09:40 ..
drwxr-x---  5 dkennetz blank 4.0K Oct  9 09:47 my_SO_project
-rwxr-x---  1 dkennetz blank   77 Oct  9 09:56 requirements.txt
-rwxr-x---  1 dkennetz blank 1.3K Oct  9 10:01 requirements_normal.txt
[dkennetz@nodecn201  venv_test]$ source ./my_SO_project/bin/activate
(my_SO_project) [dkennetz@nodecn201  venv_test]$ 

Let's see where we left off, (we should only have pandas installed, let's overwrite our old requirements_file):

(my_SO_project) [dkennetz@nodecn201  venv_test]$ pip freeze > requirements.txt
(my_SO_project) [dkennetz@nodecn201  venv_test]$ more requirements.txt
numpy==1.15.2
pandas==0.23.4
python-dateutil==2.7.3
pytz==2018.5
six==1.11.0

Cool so now we know we are just where we left off. Just a fair warning, I have pandas installed on my root python package, but what I do not have is the awscli (amazon web services command line interface). Let's say I want that for some reason in my package:

(my_SO_project) [dkennetz@nodecn201  venv_test]$ pip install awscli
(my_SO_project) [dkennetz@nodecn201  venv_test]$ pip freeze > requirements.txt
(my_SO_project) [dkennetz@nodecn201  venv_test]$ wc -l requirements.txt
15 requirements.txt
(my_SO_project) [dkennetz@nodecn201  venv_test]$ deactivate
[dkennetz@nodecn201  venv_test]$ ls
my_SO_project  requirements.txt  requirements_normal.txt
[dkennetz@nodecn201  venv_test]$ pip freeze > requirements_normal.txt
[dkennetz@nodecn201  venv_test]$ wc -l requirements_normal.txt
82 requirements_normal.txt

So we now see that installing the awscli has not made a change to our python package, but it has for our venv:

[dkennetz@nodecn201  venv_test]$ more requirements_normal.txt
appdirs==1.4.3
arrow==0.7.0
attrdict==2.0.0
avro-cwl==1.8.4
...
[dkennetz@nodecn201  venv_test]$ more requirements.txt
awscli==1.16.29
botocore==1.12.19
colorama==0.3.9
docutils==0.14
...

Finally let's say you've developed a super cool data science package entirely inside of your VM and you have made it pip install-able. The quick and easy for this is to just:

[dkennetz@nodecn201  venv_test]$ pip install -r requirements.txt

You can now use this as your package list every time your "new program" is being pip installed, and better yet you know every python package you need for it because those are the only ones you have included in your environment.

All this being said, there is no reason you can't do this every time you start a new project with new people. And if you want to have anaconda in every virtual environment you ever use, install anaconda normally:

[dkennetz@nodecn201  venv_test]$ ./Anaconda-1.6.0-Linux-x86_64.sh
[dkennetz@nodecn201  venv_test]$ source /home/dkennetz/anaconda3/bin/activate
#You will be in your anaconda environment now
(base) [dkennetz@nodecn201  venv_test]$ pip freeze > anaconda_reqs.txt

Say you've started my_SO_project2 now after that first one and you want to ensure that you have anaconda in this package. create your new venv the same way you did last time. Once inside just install all the dependencies anaconda requires and you will have a fresh anaconda virtual environment:

(my_SO_project2) [dkennetz@nodecn201  venv_test]$ pip install -r anaconda_reqs.txt

And your new venv starts as a fresh environment with nothing but anaconda installed.

I hope this clarifies what I have said in the comments, and it is helpful for you.

d_kennetz
  • 5,219
  • 5
  • 21
  • 44
1

First I'd remove python from my system.

Edit: As pointed out in the comments, this is not a good idea in macos. I'd use this approach only in a Docker container. But then, if you do have docker, you could spawn one per project and you're set.

The command which python should return nothing.

Install miniconda which is the conda package manager together with bare python.

Create an environment per project

conda create -n myproject python=3.6

Because there is no default Python, you need to source an environment whenever you want to work in it

source activate myproject

Note that, technically, miniconda creates a default env called "base" (it cannot be removed). But like any other env, it is not activated, so you still won't have any python (if you did remove as suggested), and cannot accidentally run the "wrong" python.

Hugues Fontenelle
  • 5,275
  • 2
  • 29
  • 44
  • 1
    _every_ macos installation comes with python, _The command `which python` should return nothing_ is a dangerous instruction, unless you want him to keep reinstalling his OS – deadvoid Oct 11 '18 at 16:02
  • Mmh I see. I did that in a Docker container, where the approach works like a charm. – Hugues Fontenelle Oct 11 '18 at 20:50
  • 1
    For ref, explanation about (not) removing python from osx https://stackoverflow.com/questions/3819449/how-to-uninstall-python-2-7-on-a-mac-os-x-10-6-4 – Hugues Fontenelle Oct 11 '18 at 20:57
  • I understand what you're saying, @HuguesFontenelle , but that does not sufficiently address my concern. The problem is *primarily* that there is no one-stop-shop for installing packages. Conda is great, but it gets me 80% of the way there. I **will** need to install things from `pip` and from [`brew`](https://brew.sh/). Conda does not play nicely with others, unfortunately. In fact, [`virtualenv`](https://virtualenv.pypa.io/en/latest/) is a bit more friendly to outsiders than Conda, but then I cannot use *any* of the otherwise nice features of Conda. – Mike Williamson Oct 30 '18 at 22:01
1

This question seems to be asking many different things at once.

Is there any way to install Anaconda without any default environment

As mentioned, conda will always have a base environment, which is essentially the default environment.

As such, I use Anaconda, pip, and Homebrew (I have a Mac).

As mentioned, the big difference here is that Homebrew is for system-wide installs. You should treate pip and conda as per-project installs, as I will explain in answering :

what are the best suggestions to accomplish what I want, which is to never accidentally work in an environment where it is unclear what my dependencies are, recognizing that I'm talking primarily but not exclusively about using Python?

I want to always have an equivalent to a requirements.txt or package.json file available, both to clearly separate one project from another, and so that I can easily look back to see what is installed (and what version of it).

After working in data science for many years, this is the solution I have settled on which solves all of your problems.

  1. (On Mac) install all your system-level tools with Homebrew, but do yourself a favor and try to limit this to 'generic' tools such as GNU tools (e.g. wget, tree) or other things that will not be changing on a per-project basis and/or otherwise are better installed system wide (e.g. Vagrant, Docker, PostgreSQL

  2. For each project, have a dedicated wrapper script that installs conda in the current directory. Note here that I do not mean to install a global conda and use conda environments, I mean to literally install a fresh conda in every single project. This will work fine because within your wrapper scripts, you will include a detailed, version-locked set of the conda install commands required to install the exact versions of all of the packages you require.

Additionally, your wrapper script will contain the system environment modifications required to put this conda in your $PATH and clear out or override lingering references to any other system Pythons. conda is able to install a fairly large amount of non-Python packages, so this takes care of your non-Python software dependencies as much as possible. This includes R installations and many R packages (for things like Bioconductor, its even safer to install this way than the 'vanilla' way due to greater version control).

For packages that must be installed with pip, do not worry, because every conda installation comes with its own pip installation as well. So you can pip install within your conda, and the packages will remain in the conda alone. Your pip install command will also be version locked, using requirements.txt if you wish, guaranteeing that it is reproducible.

  1. Now that you have your per-project dedicated conda instance set up, you will use the aforementioned wrapper scripts to wrap up all the commands you are using in your project to run your software. If you need to work interactively, you can just call bash from within the wrapper script and it will drop you into an interactive bash process with your environment from the wrapper script pre-populated.

In practice, I prefer to use GNU make with a Makefile to accomplish all of these things. I create a file Makefile at the root of each project directory, with contents that look like this:

SHELL:=/bin/bash
UNAME:=$(shell uname)

# ~~~~~ Setup Conda ~~~~~ #
PATH:=$(CURDIR)/conda/bin:$(PATH)
unexport PYTHONPATH
unexport PYTHONHOME

# install versions of conda for Mac or Linux, Python 2 or 3
ifeq ($(UNAME), Darwin)
CONDASH:=Miniconda3-4.7.12.1-MacOSX-x86_64.sh
endif    
ifeq ($(UNAME), Linux)
CONDASH:=Miniconda3-4.7.12.1-Linux-x86_64.sh
endif

CONDAURL:=https://repo.continuum.io/miniconda/$(CONDASH)
conda:
    @echo ">>> Setting up conda..."
    @wget "$(CONDAURL)" && \
    bash "$(CONDASH)" -b -p conda && \
    rm -f "$(CONDASH)"

install: conda 
    conda install -y \
    conda-forge::ncurses=6.1 \
    rabbitmq-server=3.7.16 \
    anaconda::postgresql=11.2 \
    pip install -r requirements.txt

# start interactive bash session
bash:
    bash
run:
    python myscript.py

Now, when you cd into your project directory, you can just run a command like make install to install all of your dependencies, and a command like make run to run your code for the project.

  • a tip for conda installations: first install all your packages without specifying any version numbers, then after you get them all installed go back and add the version numbers. This is a lot easier than trying to specify them up front.

Finally, if your software dependencies do not fit into either Homebrew or conda or pip in this manner, you need to start making some choices about how much reproducibility and isolation you really need. You might start to look into Docker containers or Vagrant virtual machines (in both cases you can keep the recipe in your project dir and continue to wrapper-script the commands to run your code, for future reference). I typically have not run into any per-project software that is not settled without a combination of conda, pip, Docker, or Vagrant.

For really extenuating circumstances, for example running RStudio locally which does not play nice with R and libs installed in conda, I will just concede a bit and brute-force install required packages globally for development purposes but also try to recreate an isolated version-locked R + library instance either in conda or Docker and run the code as a script there to verify the results can still be regenerated without the global pacakges

Community
  • 1
  • 1
user5359531
  • 3,217
  • 6
  • 30
  • 55
  • 1
    Thank you for this level of detail! Mentioning how `pip install` within a conda environment keeps everything in that environment is very helpful, although I already knew that. I disagree that it is poorly worded: yes, Homebrew is system-wide, whereas `pip` and `conda` are (ideally) per-project. That was my point: it is such a hodge-podge, **and** by default both `pip` and `conda` install systemwide (sort of: if you are not in an environment, it's in the base). I know the right thing to do, but I also know I sometimes make mistakes. But please edit my question for improvements. – Mike Williamson Dec 31 '19 at 13:52