Anaconda has become very popular in scientific computing, because it bundles together over 125 of the most widely used Python data analysis libraries. My question is, since we already have pip (which is a very widely used Python package manager), why do we need Anaconda? Couldn't we all simply type pip install
for each of the 125+ libraries and they'd all work together nicely? Or would they not work together nicely, meaning that Anaconda has done us all a big favour by sorting out the issues that arise when trying to get 125+ libraries to interact nicely?
Asked
Active
Viewed 6,511 times
16

Damien Irving
- 337
- 4
- 10
-
„We” don't need Anaconda. But if you are on windows, and don't have a compiler, Anaconda may be one of many solutions – cox May 15 '14 at 22:46
-
5I guess `pip install` for 125+ packages is too much hassle for you to try it out yourself? Having a community using the same versions of libraries is much better for collaboration than everyone having just the latest version of each library that was out on the day they installed. – John La Rooy May 15 '14 at 22:48
-
2It's actually a bit of a hassle to install some packages on Windows. For example, if I want to install numpy via pip on Windows, I need a C++ compiler, a fortran compiler, etc. If I just do "pip install numpy", it won't actually work by default on Windows and will require me to do additional setup and installation. There's no general "windows" package manager, so now I need to hunt down what I need to install, muck with my path and configuration files, etc. Same thing with a lot of the more "hefty" libraries. – Michael0x2a May 15 '14 at 22:49
-
2because it simplifies the installation, and therefore also distribution. And it lowers the entry level of scientific computing considerably. Especially on windows. – M4rtini May 15 '14 at 22:51
-
2I don't think this is opinion based, there are objective reasons, as I exposed. Another thing would be Anaconda vs Canopy vs Python(x,y)... – Davidmh May 16 '14 at 17:22
2 Answers
10
Three fundamental reasons:
- Most of these libraries require linking to system installed libraries (like, say, HDF5 for PyTables or ATLAS for Numpy), that the user may or may not be aware of. Note that Matplotlib requires a bunch of different graphical libraries, and if they are missing, it will crash on certain backends.
- pip compiles libraries (with wheels you can avoid this step, though). This requires a C compiler (difficult in Windows) and a FORTRAN compiler (difficult in Mac and Windows). It also takes time for big libraries like Scipy.
- Anaconda's metapackage anaconda is a minimum set of libraries that Continuum has made sure they play along together well. In an ideal world, we should always be using the last and most improved version of everything, but that may lead to incompatibilities.
And a complement:
- It is easy to use conda to create set of packages for distribution. So you can easily share your package including all its dependencies.

Davidmh
- 3,797
- 18
- 35
2
The problem is that a lot of these scientific packages have dependencies on a lot of external C libraries and with one another which pip cannot handle.
For example, see my question: How to Bootstrap numpy installation in setup.py
That was for my own library, but I think a lot of other packages face a similar problem.
Also, compiling libraries takes a long time. Just typing pip install numpy
on my machine takes over a minute. It's the same reason people use pre-compiled binaries with apt-get
or yum
instead of compiling programs from source.

Community
- 1
- 1

user545424
- 15,713
- 11
- 56
- 70