5

My friend just started learning Python and Flask, and is missing a lot of "best practices", e.g., a requirements.txt file.

He has recently asked me for assistance, and to make the project clean, I want to setup a CI service (Travis), but I need to work out this file first.

Since he did not initially have a requirements.txt, all information I can have is his import statements, as well as his output of pip freeze.

As there's no way to distinguish a direct requirement by the project and an indirect requirement by one of the packages, I want to find out all "top-level" packages from the list. A "top-level package" is a package that's not required by another package in the list. For example, urllib3 is required by requests, so when requests is present, urllib3 may better not appear in the final result.

Is there a way to achieve this?


If anyone wants to help me with this specific instance, here's the output of pip freeze:

apturl==0.5.2
arrow==0.12.1
asn1crypto==0.24.0
binaryornot==0.4.4
blinker==1.4
Bootstrap-Flask==1.0.9
Brlapi==0.6.6
certifi==2018.1.18
chardet==3.0.4
Click==7.0
colorama==0.3.7
command-not-found==0.3
configparser==3.5.0
cookiecutter==1.6.0
cryptography==2.1.4
cupshelpers==1.0
decorator==4.1.2
defer==1.0.6
distro-info==0.18
dominate==2.3.5
Flask==1.0.2
Flask-Bootstrap4==4.0.2
Flask-Login==0.4.1
Flask-Mail==0.9.1
Flask-Moment==0.6.0
Flask-SQLAlchemy==2.3.2
Flask-WTF==0.14.2
future==0.17.1
httpie==0.9.8
httplib2==0.9.2
idna==2.6
ipython==5.5.0
ipython-genutils==0.2.0
itsdangerous==1.1.0
Jinja2==2.10
jinja2-time==0.2.0
keyring==10.6.0
keyrings.alt==3.0
language-selector==0.1
launchpadlib==1.10.6
lazr.restfulclient==0.13.5
lazr.uri==1.0.3
louis==3.5.0
macaroonbakery==1.1.3
Mako==1.0.7
MarkupSafe==1.1.0
mysqlclient==1.3.14
netifaces==0.10.4
oauth==1.0.1
olefile==0.45.1
pexpect==4.2.1
pickleshare==0.7.4
Pillow==5.1.0
poyo==0.4.2
prompt-toolkit==1.0.15
protobuf==3.0.0
pycairo==1.16.2
pycrypto==2.6.1
pycups==1.9.73
Pygments==2.2.0
pygobject==3.26.1
pymacaroons==0.13.0
PyNaCl==1.1.2
pyRFC3339==1.0
python-apt==1.6.3
python-dateutil==2.7.5
python-debian==0.1.32
pytz==2018.3
pyxdg==0.25
PyYAML==3.12
reportlab==3.4.0
requests==2.18.4
requests-unixsocket==0.1.5
ruamel.yaml==0.15.34
SecretStorage==2.3.1
simplegeneric==0.8.1
simplejson==3.13.2
six==1.11.0
SQLAlchemy==1.2.14
system-service==0.3
systemd-python==234
traitlets==4.3.2
ubuntu-drivers-common==0.0.0
ufw==0.35
unattended-upgrades==0.1
urllib3==1.22
usb-creator==0.3.3
visitor==0.1.3
wadllib==1.3.2
wcwidth==0.1.7
Werkzeug==0.14.1
whichcraft==0.5.2
WTForms==2.2.1
xkit==0.0.0
zope.interface==4.3.2

and here are the import statements, with an additional pymysql he told me.

import os
from flask import *
from flask_bootstrap import Bootstrap
from flask_moment import Moment
from flask_wtf import FlaskForm
from wtforms import *
from wtforms.validators import *
from flask_sqlalchemy import SQLAlchemy
from flask_mail import Mail, Message
from werkzeug.security import generate_password_hash,check_password_hash
from flask_login import login_required , login_user,login_fresh,login_url,LoginManager,UserMixin,logout_user
davidism
  • 121,510
  • 29
  • 395
  • 339
iBug
  • 35,554
  • 7
  • 89
  • 134
  • You should do i liek this: create a new virtual environment → install the dependencies from the imports via `pip` → check if everything works → use `pip freeze` – Klaus D. Jan 21 '19 at 14:46
  • @KlausD. Yeah. As an experienced Python developer, **I** try to follow these good practices. But the problem is, **my *friend*** doesn't, and it's now a problem. – iBug Jan 21 '19 at 14:47
  • I don't get the question. Why are only the top level packages required? Installing them, will automatically install the rest (that they depend on), so whether the list will contain only the top level or all of them seems pretty much irrelevant. What am I missing? – CristiFati Jan 21 '19 at 14:56
  • @CristiFati I'm looking for a way to generate a *minimum* list, such that when installed by `pip`, all packages in the huge list are installed (as dependencies). – iBug Jan 21 '19 at 14:57
  • @CristiFati There are packages in the list ha have no relation to the imports. – Klaus D. Jan 21 '19 at 15:00
  • We can no help you in educating your friend. And to have a reproducible environment you should freeze all dependencies. – Klaus D. Jan 21 '19 at 15:00
  • @KlausD.: Thanks for the clarification, this was my next question :) – CristiFati Jan 21 '19 at 15:01
  • @KlausD. nope, I just want to find out all packages that are not required by another package in the same list – iBug Jan 21 '19 at 15:05
  • But why then? Because no matter what list (*minimum* or *full*) will *pip* receive, it will install the *full* list. Or if it will only install the *minimum* one, there will be missing dependencies and it won't be usable. – CristiFati Jan 21 '19 at 15:09
  • 1
    @CristiFati Yes. While installing the *minimum* list will indeed install the full list consequently, putting the full output of `pip freeze` in `requirements.txt` isn't a good idea. Therefore, I want a minimum and *maintainable* list. – iBug Jan 21 '19 at 15:14
  • 1
    A few recommendations: [pipdeptree](https://pypi.org/project/pipdeptree/) — pip dependency tree; [pipreqs](https://pypi.org/project/pipreqs/) — generate `requirements.txt` file for any project based on imports. – phd Jan 22 '19 at 19:58

1 Answers1

2

First, I wanted to suggest using PIP's API, but it's recommended to use pip as a CmdLine tool only ([PyPA]: Using pip from your program). Note that I successfully used it, I just don't expose the code (at least for now).
Here's a way that uses pkg_resources ([ReadTheDocs]: Package Discovery and Resource Access using pkg_resources).

code00.py:

#!/usr/bin/env python

import os
import pkg_resources
import sys


def get_pkgs(reqs_file="requirements_orig.txt"):
    if reqs_file and os.path.isfile(reqs_file):
        ret = dict()
        with open(reqs_file) as f:
            for item in f.readlines():
                name, ver = item.strip("\n").split("==")[:2]
                ret[name] = ver, ()
        return ret
    else:
        return {
            item.project_name: (item.version, tuple([dep.name for dep in item.requires()])) for item in pkg_resources.working_set
        }


def print_pkg_data(text, pkg_info):
    print("{:s}\nSize: {:d}\n\n{:s}".format(text, len(pkg_info), "\n".join(["{:s}=={:s}".format(*item) for item in pkg_info])))


def main(*argv):
    pkgs = get_pkgs(reqs_file=None)
    full_pkg_info = [(name, data[0]) for name, data in sorted(pkgs.items())]
    print_pkg_data("----------FULL LIST----------", full_pkg_info)

    deps = set()
    for name in pkgs:
        deps = deps.union(pkgs[name][1])
    min_pkg_info = [(name, data[0]) for name, data in sorted(pkgs.items()) if name not in deps]
    print_pkg_data("\n----------MINIMAL LIST----------", min_pkg_info)


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.\n")
    sys.exit(rc)

Output:

(py_064_03.06.08_test0) e:\Work\Dev\StackOverflow\q054292236> "e:\Work\Dev\VEnvs\py_064_03.06.08_test0\Scripts\python.exe" code00.py
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] 064bit on win32

----------FULL LIST----------
Size: 133

Babel==2.6.0
Click==7.0
Django==2.1.4
Flask==1.0.2
Jinja2==2.10
Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
Markdown==3.0.1
MarkupSafe==1.1.0
Pillow==5.3.0
PyQt5==5.9.2
PyQt5-sip==4.19.13
PyYAML==3.13
Pygments==2.3.1
QtAwesome==0.5.3
QtPy==1.5.2
Send2Trash==1.5.0
Sphinx==1.8.3
Werkzeug==0.14.1
absl-py==0.6.1
alabaster==0.7.12
asn1crypto==0.24.0
astor==0.7.1
astroid==2.1.0
backcall==0.1.0
bleach==3.0.2
certifi==2018.11.29
cffi==1.11.5
chardet==3.0.4
cloudpickle==0.6.1
colorama==0.4.1
cryptography==2.4.2
cycler==0.10.0
decorator==4.3.0
defusedxml==0.5.0
djangorestframework==3.9.0
docutils==0.14
entrypoints==0.2.3
fatiando==0.5
funcsigs==1.0.2
future==0.17.1
gast==0.2.0
grpcio==1.17.1
h5py==2.9.0
html5lib==1.0.1
idna==2.8
imagesize==1.1.0
ipaddr==2.2.0
ipykernel==5.1.0
ipython==7.2.0
ipython-genutils==0.2.0
ipywidgets==7.4.2
isort==4.3.4
itsdangerous==1.1.0
jedi==0.13.2
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
keyboard==0.13.2
keyring==17.1.1
kiwisolver==1.0.1
lazy-object-proxy==1.3.1
llvmlite==0.26.0
lxml==4.2.5
matplotlib==3.0.2
mccabe==0.6.1
mistune==0.8.4
nbconvert==5.4.0
nbformat==4.4.0
notebook==5.7.4
numba==0.41.0
numpy==1.15.4
numpydoc==0.8.0
opencv-python==3.4.4.19
packaging==18.0
pandas==0.23.4
pandocfilters==1.4.2
parso==0.3.1
patsy==0.5.1
pickleshare==0.7.5
pip==18.1
prometheus-client==0.5.0
prompt-toolkit==2.0.7
protobuf==3.6.1
psutil==5.4.8
pyOpenSSL==18.0.0
pycodestyle==2.4.0
pycparser==2.19
pycryptodome==3.7.2
pyflakes==2.0.0
pygame==1.9.4
pylint==2.2.2
pynput==1.4
pyparsing==2.3.0
python-dateutil==2.7.5
pytz==2018.7
pywin32==224
pywin32-ctypes==0.2.0
pywinpty==0.5.5
pyzmq==17.1.2
qtconsole==4.4.3
requests==2.21.0
rope==0.11.0
scapy==2.4.0
scipy==1.2.0
setuptools==40.6.3
sip==4.19.8
six==1.12.0
snowballstemmer==1.2.1
sphinxcontrib-websupport==1.1.0
spyder==3.3.2
spyder-kernels==0.3.0
statsmodels==0.9.0
tensorboard==1.12.1
tensorflow-gpu==1.12.0
tensorflow-tensorboard==1.5.1
termcolor==1.1.0
terminado==0.8.1
testpath==0.4.2
thrift==0.11.0
tornado==5.1.1
traitlets==4.3.2
typed-ast==1.1.1
urllib3==1.24.1
wcwidth==0.1.7
webencodings==0.5.1
wheel==0.32.3
widgetsnbextension==3.4.2
wrapt==1.10.11
xlrd==1.2.0

----------MINIMAL LIST----------
Size: 37

Babel==2.6.0
Click==7.0
Django==2.1.4
Flask==1.0.2
Keras==2.2.4
Keras-Applications==1.0.6
Keras-Preprocessing==1.0.5
Markdown==3.0.1
Pillow==5.3.0
PyQt5==5.9.2
PyQt5-sip==4.19.13
PyYAML==3.13
QtAwesome==0.5.3
QtPy==1.5.2
Sphinx==1.8.3
djangorestframework==3.9.0
fatiando==0.5
funcsigs==1.0.2
ipaddr==2.2.0
keyboard==0.13.2
lxml==4.2.5
opencv-python==3.4.4.19
pandas==0.23.4
patsy==0.5.1
pip==18.1
pyOpenSSL==18.0.0
pycryptodome==3.7.2
pygame==1.9.4
pynput==1.4
pywin32==224
scapy==2.4.0
spyder==3.3.2
statsmodels==0.9.0
tensorflow-gpu==1.12.0
tensorflow-tensorboard==1.5.1
thrift==0.11.0
xlrd==1.2.0

Notes:

  • (Stating the obvious): In order to get a pkg info, that pkg needs to be installed. That's why in my example I didn't used your file (I named it requirements_orig.txt), but the pkgs installed on my VEnv

  • As you can see, in my case the pkg number dropped from 133 to 37, which I'd say it's pretty manageable (of course, more filtering can be done)

  • I created the data structures based on the assumption that a pkg name is a primary key (uniquely identifies a pkg). If this is false, the code would require a bit of change

Final note: If you also want to consider your module's import list (to strip out even more pkgs, if possible), you could also try [Python.Docs]: modulefinder - Find modules used by a script (I used it in [SO]: What files are required for Py_Initialize to run? (@CristiFati's answer), only from CmdLine, but it should be trivial to use it from a script)

CristiFati
  • 38,250
  • 9
  • 50
  • 87
  • Sure. As I said at the beginning this is "*a way*", meaning there can be others (maybe even better) as well. – CristiFati Jan 21 '19 at 17:39