14

I have a reasonable understanding of the difference between conda install & pip install; How pip installs python only packages & conda can install non-python binaries. However, there is some overlap between these two. Which leads me to ask:

What's the rule of thumb for whether to use conda or pip when both offer a package?

For example, TensorFlow is available on both repositories but from the tensorflow docs:

within Anaconda, we recommend installing TensorFlow with the pip install command, not with the conda install command.

But, there are many other packages that overlap, like numpy, scipy etc.


However, this Stackoverflow answer suggests that conda install should be the default & pip should only be used if a package is unavailable from conda. Is this true even for TensorFlow or other python-only packages?

Aaron N. Brock
  • 4,276
  • 2
  • 25
  • 43
  • I'm pretty sure that tensor-flow is *not* a python-only package... – juanpa.arrivillaga Jan 08 '18 at 20:30
  • @juanpa.arrivillaga but it _can_ be installed via pip so doesn't that mean it is? In any event, that doesn't really matter in regards to the question. – Aaron N. Brock Jan 08 '18 at 20:30
  • well, just reading the link, it seems they suggest to use the `pip` simply because they officially support it, and don't make any promises regarding the community maintained conda package – juanpa.arrivillaga Jan 08 '18 at 20:32
  • I saw that, but even though that's the case it still seems like it's a better idea to use the officially supported version. No? – Aaron N. Brock Jan 08 '18 at 20:34
  • Seems like a good idea to me. – juanpa.arrivillaga Jan 08 '18 at 20:37
  • Which brings be back to my question on a rule of thumb for when it's better to use `conda` & when it's better to use `pip`... – Aaron N. Brock Jan 08 '18 at 20:38
  • My suggestion would be to use conda whenever possible, mostly because conda cannot manage pip-installed packages (update, remove, etc.). See also: https://stackoverflow.com/a/45919845/2449192 (disclaimer, that's my answer) – darthbith Jan 09 '18 at 00:03
  • @darthbith pip packages can be managed via pip without much issue and, as mentioned in the question, some packages, like `TensorFlow` aren't officially supported via `conda install`. Are other packages? Is there a easy way to tell? – Aaron N. Brock Jan 09 '18 at 02:03

1 Answers1

15

The Tensorflow maintainers actually publish the wheels of TensorFlow on PyPI that's why it's the recommended official way. The conda packages are created by the Anaconda staff and/or the community. That doesn't mean the conda packages are bad, it just means that the TensorFlow maintainers don't participate there (officially). Basically they are just saying: "If you have trouble installing it with pip the TensorFlow devs will try to help you. But we don't officially support the conda packages so if something goes wrong with the conda package you need to ask the conda-package maintainers. You've been warned."


In the more general case:

For Python-only packages you should always use conda install. There might be exceptions, for example if there is no conda-package at all or the conda package is out-of-date (and nobody is releasing a new version of that package) and you really need that package/version.

However it's different for packages that require compilation (e.g. C-Extensions, etc.). It's different because with pip you can install a package either:

  • as pre-compiled wheel
  • as package, compiled on your computer

While conda just provides the

  • compiled conda package

With compiled packages you have to be careful with binary compatibility. That means that a package is compiled against specific binary interface of another library - which could depend on the version of the libraries or the compilation flags, etc.

With conda you have to take the package as-is, which means that you have to assume that the packages are binary-compatible. If they aren't it won't work (segfault or linking errors or whatever).

If you use pip and can choose which wheel (if any) to install or compile it against the available libraries on your computer. That means it's less likely that you get a binary-incompatibility. That is (or was) a big problem if you install conda packages from different conda-channels. Because they might simply be binary-incompatible (e.g. conda-forge and the anaconda-channel have or had a few problems there).

However it should probably be decided on a case-by-case basis. I had no problems with my tensorflow conda environment where I installed all packages from the conda-forge channel, including tensorflow. However I have heard that several people had trouble with tensorflow in mixed conda-forge and anaconda channel environments. For example NumPy from the main channel and TensorFlow from the conda-forge channel might just be binary-incompatible.

My rule of thumb is:

  • If it's a Python-only package just install it (it's unlikely to make trouble). Use the conda package when possible but it won't make (much) trouble if you use pip. If you install it using pip it's not managed by conda so it's possible it won't be recognized as available dependency and you have to update it yourself, but that's about all the difference.
  • If it's a compiled package (like a C extension or a wrapper around a C library or such like) it becomes a bit more complicated. If you want to be "careful" or you have reason to expect problems:
  • Always create a new environment if you need to test compiled packages from different channels and/or conda and pip. It's easy to discard a messed up conda environment but it's a lot more annoying to fix an environment that you depend on.
  • If possible install all compiled packages using conda install from one and only one channel (if possible the main anaconda channel).
  • If not possible try to mix main anaconda channel compiled packages with conda packages from a different channel.
  • If that doesn't work try to mix conda compiled packages and pip compiled-packages (pre-compiled wheels or self-compiled installers).

You asked about why you cannot install packages from PyPI with conda. I don't know the exact reasons but pip mostly provides the package and you have to install it yourself. With conda you get an already compiled and installed package that is just "copied" without installation. That requires that the package is installed on different operating systems (Mac, Windows, Linux) and on platforms (32-bit, 64-bit), against different Python versions (2.7, 3.5, 3.6) and possibly against different NumPy versions. That means conda has to provide several packages instead of just one. That takes resources (space for the final installed packages and time for the installation) which probably aren't available or feasible. Aside from that there is probably no converter for a pypi package to a conda recipe aside from all the specifics you have to know about a package (compilation, installation) to make it work. That's just my guess though.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • This has shed some light on the matter, but this also sounds like a *huge pain*. If you have N packages and Z packages from a different source that's N*Z possible combinations you'd have to check. – Aaron N. Brock Jan 09 '18 at 14:07
  • Also, due to `conda` having a much smaller repo than `pip` I feel like I'll almost always be forced to mix the two, which will lead to just so very much conda & pip compatibility testing. Correct? – Aaron N. Brock Jan 09 '18 at 14:13
  • It depends. Normally it isn't a problem because things work out-of-the-box. Like I said I had no trouble using `tensorflow` with `conda install`. Just in case something **doesn't work** it becomes a *huge pain*. Like I said, it's usually easy to test if something works in a new conda environment so you don't shoot yourself in the foot by messing up your "work" environments. – MSeifert Jan 09 '18 at 14:14
  • Even that has changed a simple `conda install` or `pip install` into a: `conda create --clone env --name env-2` `source activate env-2` `conda install package` `` `source activate env` `conda install package` `conda env remove --name env-2` I know this is a little out of scope of the question, but wouldn't it be nicer if `conda` offically supported installing pip packages in a way that `conda` could manage them & make them play nice? – Aaron N. Brock Jan 09 '18 at 14:25
  • @AaronN.Brock Okay, maybe the answer was a bit too much tailored for tensorflow. Normally I wouldn't recommend creating test environments for pure-Python packages or "isolated C packages" because no matter how you install them (`pip` or from a different conda channel) they won't cause problems. However for these packages that depend on lots of other packages and/or are depended upon by several packages and are compiled it could make sense. It's also not a real problem to fix a messed up environment it's just harder than to discard it. – MSeifert Jan 09 '18 at 14:42
  • Hum, well, I will accept your answer for now as it is correct. However, I'm not super satisfied with this. – Aaron N. Brock Jan 09 '18 at 14:58
  • 1
    @AaronN.Brock Well, you cannot install from pip using conda because conda distributes "already installed packages" while pip requires that you locally install the package. So for one pypi package you have to create several conda packages, one for each OS (mac/linux/windows) one for each system (32bit / 64bit) and one for each Python version (2.7, 3.5, 3.6). That requires a lot of space and computation (just look at the build-backlog of conda-forge ...). – MSeifert Jan 09 '18 at 15:14
  • 2
    @AaronN.Brock Yes, nobody is really satisfied but it's already much easier compared to the times where you didn't have conda and no wheels where you had to install (and possibly even correctly link) packages. Also it rarely leads to problems when you use conda and pip, it's just not easy to manage because conda doesn't "know" about the pip-installed packages. – MSeifert Jan 09 '18 at 15:16