2

I have a tool which follows the system calls of a process. That way I know all the files/areas that were using by a process. I have a Python script which being executed (creates a process). I know all the files that were used during the run, such as the script itself. I also know the files of the modules that were used. The modules are installed in /tmp/vendor.

Based on the files inside /tmp/vendor that I found, I'm trying to figure the module name and module version so I could create a requirements file for the pip and then install them using pip install (to some other directory). Basically, I want to be able to know all the module dependencies of a Python process. Those modules could come from different areas but let's focus on one (/tmp/vendor). The way I installed the modules into /tmp/vendor is just:

pip install --requirement requirements.txt --target /tmp/vendor

Now I want I to be able to build this requirements.txt file, based on the files in /tmp/vendor.

The solution could be dynamic or static. At first I tried to solve it in a static way - check the files in /tmp/vendor. I did an example - I installed requests:

pip install requests --target /tmp/vendor

As I understand, it installs the latest version. Inside vendor I have:

ls -la vendor/
total 52
drwxr-x--- 13 user group 4096 Sep 26 17:37 .
drwxr-x---  8 user group 4096 Sep 26 17:37 ..
drwxr-x---  2 user group 4096 Sep 26 17:37 bin
drwxr-x---  3 user group 4096 Sep 26 17:37 certifi
drwxr-x---  2 user group 4096 Sep 26 17:37 certifi-2021.5.30.dist-info
drwxr-x---  5 user group 4096 Sep 26 17:37 charset_normalizer
drwxr-x---  2 user group 4096 Sep 26 17:37 charset_normalizer-2.0.6.dist-info
drwxr-x---  3 user group 4096 Sep 26 17:37 idna
drwxr-x---  2 user group 4096 Sep 26 17:37 idna-3.2.dist-info
drwxr-x---  3 user group 4096 Sep 26 17:37 requests
drwxr-x---  2 user group 4096 Sep 26 17:37 requests-2.26.0.dist-info
drwxr-x---  6 user group 4096 Sep 26 17:37 urllib3
drwxr-x---  2 user group 4096 Sep 26 17:37 urllib3-1.26.7.dist-info

Now I can see that it also installs other modules that are needed, such as urllib3 and idna.
So my tool finds for example, that I were using:

/tmp/vendor/requests/utils.py

I also notice that each module is in format:

$NAME-(.*).dist-info

And the group is the version of the module. So at first I thought that I could parse for /tmp/vendor/(.*)/.* and get the module name ($NAME) and then look for $NAME-(.*).dist-info, but the problem is that I noticed that some module don't have this *.dist-info directory so I could not figure the version of the module, which made me leave this approach.

I also tried some dynamic approaches - I know which python version was used and I could run python and try to load the module. But I could not figure a way to find the version of the module.

To summarize - I'm looking for a robust way to figure the modules the are required for my Python process in order to run. The modules should come with their version. All of the modules were installed using pip so it should simplify the task. How can it be done?

AcK
  • 2,063
  • 2
  • 20
  • 27
vesii
  • 2,760
  • 4
  • 25
  • 71
  • By accident I choose "answer from a reputable source". I'm actually looking for a suggestion of a solution. – vesii Sep 28 '21 at 20:52
  • For a specific Python module, it is possible to figure out what distribution package it belongs to (assuming it is correctly installed and has correct metadata): https://stackoverflow.com/a/60975978/11138259 -- you can also look into [_pigar_](https://pypi.org/project/pigar/) or a tool like this. -- Also this: https://docs.python.org/3/library/modulefinder.html – sinoroc Sep 28 '21 at 21:06

5 Answers5

1

Using importlib.metadata

This is the preferred way nowadays since importlib.metadata is part of the stdlib since Python 3.8; for older versions, there's a backport importlib-metdata.

from importlib import metadata as m

dists = m.distributions(path=['/tmp/vendor'])
for d in dists:
    print('Found package', d.metadata['Name'], '==', d.metadata['Version'])

Legacy: using pkg_resources

This is long superseded by importlib.metadata and is listed only for completeness sake.

import pkg_resources

dists = pkg_resources.find_distributions('/tmp/vendor')
for d in dists:
    print('Found package', d.project_name, '==', d.version)

See GETTING OR CREATING DISTRIBUTIONS for the docs on find_distributions().

hoefling
  • 59,418
  • 12
  • 147
  • 194
  • Thank you very much,`pkg_resources` works good if I know the location of the modules. Is there a way to figure the python package based on one file? For example, for `requests`, looking at file `/tmp/vendor/requests/api.py`. I was trying to use `find_distributions` but it didn't work. Also didn't find a method of `pkg_resources` that does it, but maybe I missed one. Do you know a way to do it? – vesii Oct 19 '21 at 12:59
  • There is no reliable way to do that besides scanning all the file parent directories, stopping at the first non-empty result (so smth like `for parent in pathlib.Path(module.__file__).parents: yield from pkg_resources.find_distributions(parent)` etc). This is because a file does not provide any distribution metadata by itself; if I `mkdir requests && touch requests/api.py`, where should the distribution metadata come from? You have to find the user site directory yourself one way or another. – hoefling Oct 20 '21 at 08:20
0

After navigating into the site_packages (or equivalent) directory, run iteratively the following and collect the results in a dictionary:

pkg_resources.require('dep')

where dep is the dependency as seen in the site_packages (or equivalent) directory. This will give you a dictionary of dependencies from which the requirements.txt can be reconstructed.

For example, the site_package of a virtual environment contains the following directories:

black
cairo
click
...

Now, the following produces the version info:

import pkg_resources
pkg_resources.require("black")
print({dep.key : dep.version for dep in pkg_resources.require("black")})

This results in:

{'black': '21.9b0', 'click': '8.0.1', 'mypy-extensions': '0.4.3', 'regex': '2021.8.28', 'platformdirs': '2.3.0', 'tomli': '1.2.1', 'typing-extensions': '3.10.0.2', 'pathspec': '0.9.0'}

Note A similar approach could be

import pkg_resources
{dep.key : dep.version for dep in pkg_resources.working_set}

However, this will produce everything from sys.path, not just the dependencies present in the site_packages

Arnab De
  • 402
  • 4
  • 12
0

You should first get a list of all installed packages under the relevant path

pip list --format json --path /tmp/vendor

This will give you (in JSON format) a list of all packages along with their version installed under the specified path.

Assuming you found that packages foo and bar were installed you then can get the files contained in each package with

pip show --files foo bar

The output of this command unfortunately is not available as json but it adheres to some format I think can very well be parsed.

This way you end up with a list of files where you know for each file what package it came from.

Note that python compiles the *.py files into *.pyc so your process monitor might give you *.pyc files that, of course, are not in the list. But you could just change the extension from *.pyc to *.py before you do the lookup.

SebDieBln
  • 3,303
  • 1
  • 7
  • 21
0

Can you install libraries on that machine? If you can access the source code (and if I'm understanding the problem correctly), you could try to use a couple of libraries to check the requirements. Check pipreqs and/or pigar (pip install pireqs or pip install pigar)

EDIT: Using pigar, I got the following output when scanning a folder:

 > pigar -c /path/of/code/folder
 [...]
 ===============================
  PACKAGE    | CURRENT | LATEST
  -----------+---------+-------
  Pillow     | 8.3.2   | 8.3.2 
  matplotlib | 3.3.3   | 3.4.3
  numpy      | 1.19.4  | 1.21.2
  pyserial   | 3.5     | 3.5
 ===============================
tglaria
  • 5,678
  • 2
  • 13
  • 17
-2

if modules are installed you should be able to create requirement.txt file using pip freeze > requirements.txt.Create venv to use it. All modules must be installed using pip. Also you can look similar answer: Retrieving the requirements of a Python single script

ShreyasK
  • 15
  • 3
  • But `pip freeze` gives me *all* the modules. I want just the one that were used. Some king of mechanism that gets a path and returns the module name and version. not ALL of them. – vesii Sep 26 '21 at 18:14