10

How should I generate requirements.txt for Python projects?

Here is the problem I am having with pip freeze. Suppose my package P requires A, B, C. Suppose C is a library that imports X, Y, Z, but only X is needed by P. Then if I:

1) Install A
2) Install B
3) Install C, which installs X, Y, Z
4) Do a pip freeze into P's requirements.txt 

Then P's requirements.txt will look like:

1) A
2) B
3) C
4) X
5) Y
6) Z

But Y and Z are not actually required in my Python installation for P to run.

Many of the answers assume that, Y must be. However, python is a dynamic language. It is very often the case that, for example C is a huge library that uses numpy or pandas for some functionality; but P doesn't call that part of the library - in this case, I don't really need to pull those in if I know what parts of C that P needs. If all libraries were "small"; this would be rare, however there are a lot of "kitchen sink" libraries.

As far as I can tell, running pip freeze to generate P's requirements will show you all dependencies of dependencies, and thus is a superset of P's actual dependencies.

Tommy
  • 12,588
  • 14
  • 59
  • 110
  • I don't see this as a problem. If the requirements specifies the dependency's dependencies, does it do any harm? You'll end up with X, Y, Z installed when C gets installed even if you manually go and and remove X, Y, Z from the requirements.txt file. pip freeze makes the installation dependency is transparent to the user. Just take care to install everything related to your project in a virtual environment before running pip freeze. – Haleemur Ali May 21 '15 at 03:54
  • I guess by "problem" I meant that I wanted the minimal set of requirements for P to run as P's requirements.txt. But due to the problem above the requirements balloons to a much larger set than what is needed for P to run. – Tommy May 21 '15 at 14:33
  • You need C for P to run. You need Y and Z for C to run. By transitivity, you need Y and Z for P to run. The whole point of requirements.txt is that your packages only change when you change them yourself. If we didn't record Y and Z's versions, that wouldn't be possible. – Kevin May 21 '15 at 14:39
  • That is false. C is a library with lots of functionality. Some parts of that are dependent on Y and Z. However, not the parts called by P, so P can still import C, and if X is installed only, run. Remember that lots of Python errors are only encountered at runtime, so if P only calls the part of C importing X, everything is fine. I've edited my question to make this slightly more clear. – Tommy May 21 '15 at 14:42
  • Python is a highly dynamic language. Suppose a newer version of Y comes out, which monkey-patches X all over the place at import time. You *don't* want that in your virtualenv, right? Well, if you don't pin the Y version in requirements.txt, you could end up with it. – Kevin May 21 '15 at 14:48
  • Well that is a different argument for why I might want Y (possibly post that as an answer below) than the transitive argument. – Tommy May 21 '15 at 14:51
  • I've answered a suggestion in this another similar question here https://stackoverflow.com/a/65666949/1512555 I'm not sure how to link here so it'll be more visible (I just added it as a separate answer) – alegria Jan 11 '21 at 12:53

4 Answers4

13

The purpose of a virtualenv is to have total control over the packages installed.

Suppose you only listed A, B, C, and X. Every time you create a new virtualenv from that requirements file, you'll get the latest versions of Y and Z. There are several problems with this:

  1. You can't know you're not using Y: For a sufficiently complex project, it is nearly impossible to audit every codepath to ensure C never calls into Y. You're not just worrying about your own code any more; you're worrying about C's code as well. This just doesn't scale.
  2. Even if you're just importing Y, you're using it: Python allows arbitrary code execution at import time. A new version of Y could do all sorts of obnoxious things at import time, such as printing to stdout, monkey patching X, or just about anything else you can imagine. A well-designed Y shouldn't do these things, but you'll find the quality of packages on PyPI highly variable.
  3. New versions of Y can pull in new dependencies: If you include a new version of Y, you could end up adding package W to your virtualenv too, because the new version of Y requires it. As more packages are added, the first two problems are exacerbated. Worse, you might find that the new version of Y depends on a newer version of X, in which case you won't end up with the packages you actually want.
  4. Producing a known-good configuration is more important: pip freeze is not designed to figure out minimal requirements. It is designed to enable deploying a complete application to many different environments consistently. That means it will err on the side of caution and list everything which could reasonably affect your project.

For these reasons, you should not try to remove Y and Z from your requirements file.

Kevin
  • 28,963
  • 9
  • 62
  • 81
5

There is a python module called pipreqs . It generates requirements.txt based on imports in the project.

gopiariv
  • 454
  • 7
  • 9
3
  1. Install the pipreqs library (e.g. conda install -c conda-forge pipreqs)
  2. Change dir to the project folder (cd your/repository)
  3. Run the command pipreqs --force

Or just pipreqs --force your/repository.

See additional information in the official source: https://pypi.org/project/pipreqs/

Foxy Fox
  • 403
  • 6
  • 13
1

I've answered this question in a different stackoverflow post https://stackoverflow.com/a/65666949/1512555 where I recommended using pip-compile from pip-tools

alegria
  • 1,290
  • 14
  • 23