-2

In this article, the author suggests the following

To install fuzzy matcher, I found it easier to conda install the dependencies (pandas, metaphone, fuzzywuzzy) then use pip to install fuzzymatcher. Given the computational burden of these algorithms you will want to use the compiled c components as much as possible and conda made that easiest for me.

Can someone explain why he is suggesting to use Conda to install dependencies and then use pip to install the actual package i.e fuzzymatcher? Why can't we just use Conda for both? Also, how do we know if we are using the compiled C packages as he suggested?

Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327
  • 3
    Does this answer your question? [What is the difference between pip and conda?](https://stackoverflow.com/questions/20994716/what-is-the-difference-between-pip-and-conda) – aman.zip Feb 21 '21 at 00:38

3 Answers3

2

Other answers have addressed how Conda does a better job managing non-Python dependencies. As for why use Pip at all, in this case it's not complicated: the fuzzymatcher package was not available on Conda when the article was written (18 Feb 2020). The first and only version of the package was uploaded on Conda Forge on 1 Dec 2020.

Unless one wants an version older (< 0.0.5), one can now just use Conda. Going forward, Conda Forge's Autotick Bot will automatically submit pull requests and build any new versions of the package whenever they get pushed to PyPI.

merv
  • 67,214
  • 13
  • 180
  • 245
1

For the compiled C packages, you could import a package, see where it's located, and check the package itself to see what it imports. At some point, you would read into an import of a compiled module (.so extension on *nix). There's possibly an easier way, but that may depend on at what point in the import sequence of the package the compiled module is loaded.

Fuzzymatcher may not be available through Conda, or only an outdated version, or only a version that matches an outdated set of dependencies. Then you may end up with an out-of-date set of packages. Pip may have a more recent version of fuzzymatcher, and likely cares less (for better or worse) on the versions of various other packages in your environment. I'm not familiar with fuzzymatcher, so I can't give you an exact reason: you'd have to ask the author.

Note that the point of that paragraph, on installing the necessary packages with Conda, is that some packages require (C) libraries (not necessary compiled packages, though these will depend on these libraries) that may not be installed by default on your system. Conda will install these for you; Pip will not.

9769953
  • 10,344
  • 3
  • 26
  • 37
1

conda is the package manager (installer and uninstaller) for Anaconda or Miniconda.

pip is the package manager for Python.

Depend on your system environment and additional settings, pip and conda may install onto the same Python installation folder ($PYTHONPATH/Lib/site-packages or %PYTHONPATH%\Lib\site-packages). Hence both conda and pip usually work well together.

However, conda and pip get their Python packages from different channels or websites.

  1. conda searches and downloads from the official channel: https://repo.anaconda.com/pkgs/

    • This packages are supported officially by Anaconda and hence maintained in that channel.

    • However, we may not find every Python packages or packages of newer versions than those in the official channel. That is why sometimes we may install Python packages from "conda-forge" or "bioconda". These are the unofficial channels maintained by developers and other friendly users.

    • We could specify other channel like these:

      conda install <package1> --channel conda-forge
      conda install <package2> --channel bioconda
      
  2. pip searches and download from pypi

    • We should be able to download every publicly available Python packages there.
    • These packages are generated and uploaded by developers and friendly users.
    • The dependency setting in each package may not be fully tested nor verified.
    • These packages may not support older or newer version of Python.

Hence, if you are using Anaconda or Miniconda, you should use conda. If you could not find specific packages from the official channels, you may try conda-forge or bioconda. Finally get it from pypi.

However, if you do not use Anaconda, then stick with pip.

For advanced users, you may download the most latest libraries from their source (such as github, gitlab, etc.) However there is a catch:

  1. Some Python packages are written in pure Python. In this case, you should not have issue to install these packages into your system.

  2. Some Python packages are written in C, C++, Go, etc. In this case, you would need

    • A supported compiler for your system as well as your Python environment (32- or 64-bit, versions).
    • Python header files, linkable Python libraries and archives specific for your installed Python version. Anaconda includes these in its installation.

How do we know if a Python package needs a particular compiler?

It may not be easy for people to find out. However, you could find out in the following means (possibly order):

  1. Look at the landing page (or README.nd or README.txt files) in the source repository.
    For example, if you go to Pandas's source repository, it show that it needs cython, hence the installation would need a C compiler.

  2. Look at the setup.py in the source repository. For example, if you go to numpy's setup.py, it needs a C compiler.

  3. Look at the amount of source code that are written using programming languages that need compilation (such as C, C++, Go, etc.) For example, numpy library is written using 35.7% of C, 1.0% of C++, etc. However, this is only a guide as these source code may be only testing routines.

Languages used in numpy

  1. Ask in stackoverflow.
yoonghm
  • 4,198
  • 1
  • 32
  • 48